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I.  Abstract 

Tactical  wireless  networks  often  comprise  clusters  of  nodes,  which  arc  fed  information  from  a  head 
node.  Transmit  antenna  arrays  mounted  on  the  head  node  (e.g.,  unmanned  aerial  vehicle)  offer  an  at¬ 
tractive  means  of  boosting  capacity  and  assuring  quality  of  service  through  transmit  beamforming.  The 
central  goal  of  our  research  was  to  investigate  efficient  multiuser  transmit  beamforming  strategies,  and 
develop  high-throughput  low-complexity  algorithms  that  will  meet  the  needs  of  future  tactical  wireless 
networks.  Sum  capacity,  quality  of  service,  and  fair  service  objectives  were  considered,  under  unicast 
and  multicast  scenarios.  A  key  innovation  of  our  work  is  the  concept  of  physical  layer  multicasting, 
which  affords  significant  capacity  gains.  A  number  of  effective  and  efficient  algorithms  were  developed, 
drawing  upon  and  contributing  to  semidefinite  relaxation  (SDR)  tools.  Closely-related  added-value  top¬ 
ics  of  our  research  program  included  i)  computationally  efficient  quasi-optimal  multiple  input  multiple 
output  detection  (using  lattice  search,  data  association,  and  SDR  tools);  ii)  accurate  and  scalable  node 
localization  from  pairwise  distance  estimates;  and  iii)  tracking  of  time-varying  carrier  signals  (using  and 
developing  associated  particle  filtering  tools).  Our  work  on  these  topics  has  been  reported  in  seven  (IEEE, 
SIAM)  journal  papers  and  seven  IEEE  conference  papers.  Valiants  of  some  of  our  published  algorithms 
arc  currently  considered  for  adoption  by  industry. 

Keywords:  Transmit  beamforming,  minimization  of  radiation  power,  quality  of  service,  max-min  fair, 
sum  capacity,  broadcasting,  multicasting,  convex  optimization,  semidefinite  programming,  NP-hard  prob¬ 
lems,  semidefinite  relaxation,  lattice  search,  integer  least  squares,  node  localization,  multidimensional 
scaling,  tracking,  intercept,  particle  filtering 
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II.  Motivation  and  Problem  Statement 

Tactical  wireless  networks  must  seamlessly  support  diverse  services,  including  command  and 
control,  “bulk”  information  dissemination  (e.g.,  terrain  maps),  and  large-scale  surveillance  and 
sensing  (e.g.,  radar,  alien  signal  interception,  biochemical  sensor  networks).  These  come  with 
equally  diverse  service  needs:  guaranteed  quality  of  service  for  command  and  control,  very  high 
transmission  rates  for  bulk  information  dissemination,  reliable  detection  under  stringent  energy 
constraints  for  sensor  networks.  While  truly  seamless  unified  solutions  are  still  way  down  the 
road,  there  is  a  number  of  enabling  communication  technologies  and  concepts  that  have  emerged 
at  the  center  stage  of  network  science,  particularly  for  tactical  networks.  These  include 
.  The  deployment  of  transmit  antenna  arrays,  for  assuring  quality  of  service  and/or  higher  data 
rates  through  spatial  multiplexing; 

.  Wireless  multicasting,  as  a  means  of  improving  spectral  utilization  and  assuring  quick  and 
efficient  delivery  of  mission-critical  information; 

.  Effective  strategies  for  vector  decoding,  as  a  means  of  improving  spectral  efficiency  and  ro¬ 
bustness  to  jamming; 

.  Node  localization,  for  sensing,  routing,  fading  channel  estimation,  and  situational  awareness; 
and 

.  Carrier  sensing  and  tracking,  for  signal  intelligence,  dynamic  spectrum  monitoring  and  access. 

Our  work  in  this  project  addresses  many  important  aspects  of  the  aforementioned  enabling 
concepts  and  technologies.  While  many  of  our  contributions  specifically  target  applications  in 
tactical  networks,  some  have  a  clear  dual  use,  e.g.,  in  802. 16e  fixed  wireless  systems  and  4G 


cellular  networks. 
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III.  Methodology 

Modern  convex  optimization  /  convex  approximation  underlies  most  of  our  work  in  this 
project.  Specifically,  semidefinite  programming  /  semidefinite  relaxation  forms  the  basis  of  our 
design  approach.  More  conventional  optimization  tools  and  concepts  (e.g.,  water-filling,  branch- 
and-bound)  also  come  into  play  in  certain  algorithms,  and  particle  filtering  is  the  framework  for 
our  work  on  tracking  of  time- varying  carrier  signals. 

IV.  Results 

Our  main  results  and  findings  are  reviewed  next,  classified  in  four  categories:  Multiuser  trans¬ 
mit  beamforming  (including  sum  capacity,  quality  of  service,  and  fair  service  objectives);  mul¬ 
tiple  input  multiple  output  decoding;  node  localization;  and  tracking  of  time-varying  carrier  sig¬ 
nals  for  synchronization,  Doppler  estimation,  and  signal  intelligence  applications.  Conclusions 
are  drawn  and  recommendations  are  made  in  the  following  section. 

A.  Multiuser  Transmit  Beamforming 
A.  1  Sum  capacity  objective 

Multiuser  transmit  beamforming  forms  the  core  of  our  work  under  this  project.  The  idea  is  to 
employ  a  transmit  antenna  array  to  create  multiple  beams  directed  towards  the  individual  users, 
in  order  to  increase  the  attainable  throughput,  as  measured  by  sum  capacity.  In  particular,  we  are 
interested  in  the  practically  important  case  of  more  users  than  transmit  antennas,  which  requires 
user  selection.  Optimal  solutions  to  this  problem  can  be  prohibitively  complex  for  online  imple¬ 
mentation  at  the  access  point  and  entail  so-called  Dirty  Paper  (DP)  precoding  for  known  interfer¬ 
ence.  Suboptimal  solutions  capitalize  on  multiuser  (selection)  diversity  to  achieve  a  significant 
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fraction  of  sum  capacity  at  lower  complexity  cost.  We  analyzed  the  throughput  performance  in 
Rayleigh  fading  of  a  suboptimal  greedy  DP-based  scheme  proposed  by  Tu  and  Blum.  We  also 
proposed  another  user-selection  method  of  the  same  computational  complexity  based  on  simple 
zero-forcing  beamforming.  Our  results  indicate  that  the  proposed  method  attains  a  significant 
fraction  of  sum  capacity,  similar  to  Tu  and  Blums  scheme,  however  at  a  much  lower  overall 
(design  plus  implementation)  complexity;  it  thus,  offers  an  attractive  alternative  to  DP-based 
schemes. 

A.2  Multicasting  under  Quality  of  Service  (QoS)  and  Max-min  Fair  (MMF)  objectives 

Next,  we  considered  the  problem  of  transmit  beamforming  in  the  context  of  common  infor¬ 
mation  broadcasting  or  multicasting  applications,  wherein  channel  state  information  (CSI)  is 
available  at  the  transmitter.  Unlike  the  usual  blind  isotropic  broadcasting  scenario,  the  availabil¬ 
ity  of  CSI  allows  transmit  optimization.  A  minimum  transmission  power  criterion  was  adopted, 
subject  to  prescribed  minimum  received  signal-to-noise  ratios  (SNRs)  at  each  of  the  intended  re¬ 
ceivers.  A  related  maxmin  SNR  fair  problem  formulation  was  also  considered  subject  to  a  trans¬ 
mitted  power  constraint.  It  was  proven  that  both  problems  are  NP-hard;  however,  suitable  refor¬ 
mulation  allows  the  successful  application  of  semidefinite  relaxation  (SDR)  techniques.  SDR 
yields  an  approximate  solution  plus  a  bound  on  the  optimum  value  of  the  associated  cost/reward. 
SDR  was  motivated  from  a  Lagrangian  duality  perspective,  and  its  performance  was  assessed 
via  pertinent  simulations  for  the  case  of  Rayleigh  fading  wireless  channels.  We  found  that  SDR 
typically  yields  solutions  that  are  within  3  to  4  dB  of  the  optimum,  which  is  often  good  enough 
in  practice.  In  several  scenarios,  SDR  generates  exact  solutions  that  meet  the  associated  bound 
on  the  optimum  value.  This  was  illustrated  using  far-held  beamforming  for  a  uniform  linear 


transmit  antenna  array.  Interestingly,  these  numerical  experiments  effectively  led  us  to  discover 
new  and  exact  convex  reformulations  of  the  basic  problem,  via  spectral  factorization,  applicable 
when  the  channel  vectors  are  Vandermonde. 

We  also  analyzed  the  approximation  performance  of  the  aforementioned  broadcast  beamform¬ 
ing  algorithms  theoretically.  In  particular,  we  showed  that  SDR  provides  an  0(m2)  approxima¬ 
tion  in  the  real  case,  and  an  0(m )  approximation  in  the  complex  case,  where  m  is  the  total 
number  of  receivers.  Moreover,  we  showed  that  these  bounds  are  tight  up  to  a  constant  factor. 
When  the  phase  spread  of  the  entries  of  the  steering  vectors  is  bounded  away  from  n/2,  we 
further  established  a  certain  constant  factor  approximation  (depending  on  the  phase  spread  but 
independent  of  the  number  of  receivers,  m  and  the  number  of  transmit  antennas,  n)  for  both 
SDR  and  a  convex  quadratic  programming  restriction  of  the  original  NP-hard  problem.  Finally, 
we  considered  a  related  problem  of  finding  a  maximum  norm  vector  subject  to  m  convex  homo¬ 
geneous  quadratic  constraints.  We  showed  that  SDR  provides  an  0{l/ln(m))  approximation, 
which  is  analogous  to  a  result  of  Nemirovski,  Roos  and  Terlaky  for  the  real  case. 

Having  settled  the  case  of  a  single  multicast  group,  we  then  generalized  to  multiple  co-channel 
multicast  groups.  Two  different  design  objectives  were  considered:  minimizing  total  transmis¬ 
sion  power  while  guaranteeing  a  prescribed  minimum  signal-to-interference-plus-noise-ratio 
(SINR)  at  each  receiver;  and  a  fair  approach  maximizing  the  overall  minimum  SINR  under  a 
total  power  budget.  The  core  problem  is  a  multicast  generalization  of  the  multiuser  downlink 
beamforming  problem;  the  difference  is  that  each  transmitted  stream  is  directed  to  multiple  re¬ 
ceivers,  each  with  its  own  channel.  Such  generalization  is  relevant  and  timely,  e.g.,  in  the  context 
of  802. 16e  wireless  networks.  The  joint  problem  also  contains  single  group  multicast  beamform¬ 
ing  as  a  special  case.  The  latter  (and  therefore  also  the  former)  is  NP-hard.  This  motivates  the 
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pursuit  of  computationally  efficient  quasi-optimal  solutions.  It  was  shown  that  Lagrangian  re¬ 
laxation  coupled  with  suitable  randomization  /  co-channel  multicast  power  control  loops  yield 
computationally  efficient  high-quality  approximate  solutions.  For  a  significant  fraction  of  prob¬ 
lem  instances,  the  solutions  generated  this  way  are  exactly  optimal.  Extensive  numerical  results 
using  both  simulated  and  measured  wireless  channels  (courtesy  of  the  University  of  Alberta, 
Canada)  were  presented  to  corroborate  our  main  findings. 

Whereas  multi-group  multicast  transmit  beamforming  under  SINR  constraints  is  NP-hard  in 
general,  we  have  shown  that,  in  the  special  case  of  Vandermonde  steering  vectors  it  is  in  fact  a 
semidefinite  problem,  which  can  be  exactly  and  efficiently  solved. 

We  also  considered  various  robust  formulations  for  the  problem  of  single-group  multicasting, 
when  the  steering  vectors  are  only  approximately  known.  We  obtained  an  elegant  theoretical 
relationship  between  the  optimal  solutions  of  the  original  non-robust  and  associated  robust  for¬ 
mulations  of  the  problem:  the  two  are  related  via  a  simple  (albeit  solution-dependent)  scaling. 
This  relationship  naturally  suggests  robust  multicast  beamforming  approximation  algorithms, 
through  semidefinite  relaxation  of  the  original  non-robust  version  of  the  problem. 

B.  Multiple  Input  Multiple  Output  Decoding 

Multiple  input  multiple  output  (MIMO)  communication  links  are  now  common  in  both  com¬ 
mercial  and  tactical  wireless  networks,  for  spectral  efficiency,  fading,  and  jam-resilience  con¬ 
siderations.  The  associated  optimum  vector  decoding  problem  is  known  to  be  NP-hard.  We 
developed  two  new  computationally  efficient  MIMO  decoding  algorithms  that  afford  very  com¬ 
petitive  symbol  error  rate  (SER)  performance.  The  first  algorithm  is  a  judicious  combination 
of  probabilistic  data  association  (PDA)  and  sphere  decoding  (SD).  The  second  is  based  on  the 
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principle  of  semidefinite  relaxation  (SDR). 

The  key  idea  behind  the  hybrid  PDA-SD  detector  is  to  reduce  the  dimension  of  the  problem 
solved  via  SD  by  first  running  a  single  stage  of  the  PDA  to  fix  symbols  that  can  be  decoded  with 
high  reliability.  This  two-step  algorithm  attains  a  considerably  better  performance-complexity 
tradeoff  than  SD  and  PDA  for  low  to  moderate  signal-to-noise  ratio  (SNR)  or  higher  problem 
dimensions. 

The  second  approach,  based  on  SDR,  has  been  specifically  developed  for  MIMO  systems 
employing  high-order  QAM  constellations.  The  new  approach  affords  improved  detection  per¬ 
formance  compared  to  existing  solutions  of  comparable  worst-case  complexity  order,  which  is 
nearly  cubic  in  the  dimension  of  the  transmitted  symbol  vector  and  independent  of  the  constel¬ 
lation  order  for  uniform  QAM,  or  affine  in  the  constellation  order  for  non-uniform  QAM. 

C.  Acquiring  Channel  State  Information:  Node  Localization 

Given  a  set  of  pairwise  distance  estimates  between  nodes,  it  is  often  of  interest  to  generate 
a  map  of  node  locations.  This  is  an  old  nonlinear  estimation  problem  that  has  recently  drawn 
interest  in  the  signal  processing  community,  due  to  the  emergence  of  wireless  sensor  networks. 
Sensor  maps  are  useful  for  estimating  the  spatial  distribution  of  measured  phenomena  (includ¬ 
ing  shadowing  and  fading),  and  for  routing  purposes.  We  proposed  a  two-stage  algorithm  that 
combines  algebraic  initialization  and  gradient  descent.  In  particular,  we  borrowed  an  algebraic 
solution  known  as  Fastmap  from  the  database  literature  and  adapted  it  to  the  sensor  network 
context,  using  a  specific  choice  of  anchor/pivot  nodes.  The  resulting  estimates  are  fed  to  a  gradi¬ 
ent  descent  iteration.  The  overall  algorithm  offers  very  competitive  performance  at  significantly 
lower  complexity  than  existing  solutions  with  similar  estimation  performance.  For  a  certain  mul- 
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tiplicative  measurement  noise  model  that  is  often  adopted  in  the  literature,  we  also  derived  the 
pertinent  Cramer- Rao  bound  (CRB).  Simulations  indicate  that  the  performance  of  our  algorithm 
is  close  to  the  CRB  when  the  network  is  (close  to)  fully  connected,  in  the  sense  that  every  node 
can  estimate  its  distance  from  all  (most)  other  nodes.  Our  adaptation  of  Fastmap  also  turns  out 
to  make  a  big  difference  when  used  to  initialize  other  iterative  distributed  estimation  algorithms 
that  have  been  developed  specifically  for  sparse  networks. 

D.  Synchronization,  Doppler,  and  Intercept  Issues:  Particle  Filtering  Tools 

In  collaboration  with  Dr.  Ananthram  Swami,  of  ARL/Adelphi,  we  also  investigated  problems 
in  time- varying  frequency  estimation.  These  appear  in  numerous  pertinent  applications:  syn¬ 
chronization,  Doppler  frequency  tracking,  and  signal  intelligence,  to  name  a  few.  We  adopted  a 
particle  filtering  (PF)  framework,  and  contributed  closed-form  solutions  for  the  optimal  impor¬ 
tance  function,  plus  associated  sampling  procedures. 

We  first  considered  the  problem  of  tracking  the  frequency  and  complex  amplitude  of  a  frequency- 
hopped  complex  sinusoid,  using  a  novel  stochastic  state-space  formulation  that  is  naturally 
suited  for  the  application  of  PF  tools.  The  problem  is  of  considerable  interest  for  interference 
mitigation  in  frequency-hopped  wireless  networks,  and  for  signal  intelligence  in  military  com¬ 
munications.  The  proposed  particle  filtering  approach  has  a  number  of  desirable  features.  It 
affords  high-resolution  estimates  of  carrier  frequency  and  hop  timing,  manageable  complexity 
(linear  in  the  number  of  processed  samples),  and  flexibility  in  tracking  signals  with  irregular 
hopping  patterns  due  to  intentional  timing  jitter.  The  proposed  state-space  model  is  not  only 
parsimonious,  but  fortuitous  as  well:  it  turns  out  that  the  associated  optimal  importance  function 
(that  minimizes  the  variance  of  the  particle  weights)  can  be  computed  in  closed  form,  and  thus 
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samples  from  it  can  be  drawn  using  rejection  techniques.  Both  prior  and  optimal  importance 
sampling  versions  were  developed  and  illustrated  in  pertinent  simulations. 

Next,  we  turned  our  attention  to  the  problem  of  tracking  the  frequency  and  complex  ampli¬ 
tude  of  a  slowly  time-varying  (TV)  harmonic  signal.  Similar  to  previous  PF  approaches  to  TV 
spectral  analysis,  we  assumed  that  the  frequency  and  complex  amplitude  evolve  according  to  a 
Gaussian  AR(1)  model;  but  we  concentrated  on  the  important  special  case  of  a  single  TV  har¬ 
monic.  For  this  case,  we  showed  that  the  optimal  importance  function  can  be  computed  in  closed 
form.  We  also  developed  a  suitable  procedure  to  sample  from  the  optimal  importance  function. 
The  end  result  is  a  custom  PF  solution  that  is  more  efficient  than  generic  ones,  and  can  be  used 
in  a  broad  range  of  important  applications  that  postulate  a  single  TV  harmonic  component,  e.g., 
TV  Doppler  estimation  in  communications  and  radar. 

E.  Publications 

Summarizing  the  status  of  journal  papers  (5  appeared/accepted  +  2  submitted  for  publication 
=  7  overall): 

1.  E.  Karipidis,  N.D.  Sidiropoulos,  Z.-Q.  Luo,  “Quality  of  Service  and  Max-min-fair  Transmit 
Beamforming  to  Multiple  Co-channel  Multicast  Groups,  submitted  to  IEEE  Trans,  on  Signal 
Processing,  July  2006. 

2.  G.  Latsoudas,  N.D.  Sidiropoulos,  “A  Fast  and  Effective  Multidimensional  Scaling  Approach 
for  Node  Localization  in  Wireless  Sensor  Networks”,  submitted  to  IEEE  Trans,  on  Signal  Pro¬ 
cessing,  July  2006. 

3.  N.D.  Sidiropoulos,  Z.-Q.  Luo,  “A  Semidefinite  Relaxation  Approach  to  MIMO  Detection  for 
High-order  QAM  Constellations”,  IEEE  Signal  Processing  Letters,  to  appear. 
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4.  Z.-Q.  Luo,  N.D.  Sidiropoulos,  P.  Tseng,  S.  Zhang,  “Approximation  Bounds  for  Quadratic 
Optimization  with  Homogeneous  Quadratic  Constraints”,  SIAM  Journal  on  Optimization ,  to 
appear. 

5.  N.D.  Sidiropoulos,  T.N.  Davidson,  Z-Q  (Tom)  Luo,  “Transmit  Beamforming  for  Physical 
Layer  Multicasting”,  IEEE  Trans,  on  Signal  Processing,  54(6,  Part  1):2239-2251,  June  2006. 

6.  G.  Latsoudas,  N.D.  Sidiropoulos,  “A  Hybrid  Probabilistic  Data  Association  -  Sphere  De¬ 
coding  Detector  for  Multiple-Input  Multiple-Output  Systems”,  IEEE  Signal  Processing  Letters, 
12(4):309-312,  Apr.  2005. 

7.  G.  Dimic,  N.D.  Sidiropoulos,  “On  Downlink  Beamforming  with  Greedy  User  Selection: 
Performance  Analysis  and  a  Simple  New  Algorithm”,  IEEE  Trans,  on  Signal  Processing, 
53(10):3857-3868,  Oct.  2005. 

Regarding  conference  papers  (7  appeared/accepted;  2  in  collaboration  with  Ananthram  Swami, 
ARL/Adelphi,  MD): 

1.  E.  Tsakonas,  N.D.  Sidiropoulos,  A.  Swami,  “Time-Frequency  Analysis  Using  Particle  Fil¬ 
tering:  Closed-form  Optimal  Importance  Function  and  Sampling  Proceedure  for  a  Single  Time- 
varying  Harmonic”,  in  Proc.  Nonlinear  Statistical  Signal  Processing  Workshop:  Classical,  Un¬ 
scented,  and  Particle  Filtering  Methods,  Sep.  13-15,  2006,  Corpus  Christi  College,  Cambridge, 
U.K.,  to  appear. 

2.  N.D.  Sidiropoulos,  A.  Swami,  A.  Valyrakis,  “Tracking  a  Frequency-Hopped  Signal  Using 
Particle  Filtering”,  Proc.  IEEE  ICASSP  2006,  May  14-19,  2006,  Toulouse,  France. 

3.  E.  Karipidis,  N.D.  Sidiropoulos,  Z.-Q.  (Tom)  Fuo,  “Convex  Transmit  Beamforming  For 
Downlink  Multicasting  to  Multiple  Co-channel  Groups”,  Proc.  IEEE  ICASSP  2006,  May  14-19, 


2006,  Toulouse,  France. 
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4.  E.  Karipidis,  N.D.  Sidiropoulos,  Z.-Q.  (Tom)  Luo,  “Transmit  Bemaforming  to  Multiple  Co¬ 
channel  Multicast  Groups”,  in  Proc.  IEEE  CAMSAP  2005 ,  Dec.  12-14,  2005,  Puerto  Vallarta, 
Mexico. 

5.  G.  Latsoudas,  N.D.  Sidiropoulos,  “A  Two-stage  FASTMAP-MDS  Approach  for  Node  Local¬ 
ization  in  Sensor  Networks”,  in  Proc.  IEEE  CAMSAP  2005,  Dec.  12-14,  2005,  Puerto  Vallarta, 
Mexico. 

6.  N.D.  Sidiropoulos,  T.N.  Davidson,  “Broadcasting  with  Channel  State  Information”,  in  Proc. 
IEEE  SAM  2004,  July  18-21,  Sitges,  Barcelona,  Spain. 

7.  G.  Dimic,  N.D.  Sidiropoulos,  “Low-Complexity  Downlink  Beamforming  for  Maximum  Sum 
Capacity”,  in  Proc.  IEEE  ICASSP  2004,  May  17-21,  Montreal,  Quebec,  Canada. 

All  journal  and  conference  papers  produced  to  date  are  included  in  the  Annex. 

Our  research  of  course  continues;  in  addition  to  the  above,  the  following  journal  papers  stem¬ 
ming  from  our  ICASSP06  conference  paper  are  currently  in  progress 

1.  E.  Karipidis,  N.D.  Sidiropoulos,  Z.-Q.  (Tom)  Luo,  “Far-held  Multicast  Beamforming  of  Uni¬ 
form  Linear  Antenna  Arrays  is  a  Convex  Problem”,  in  preparation  for  submission  to  IEEE  Trans, 
on  Signal  Processing. 

2.  E.  Karipidis,  N.D.  Sidiropoulos,  Z.-Q.  (Tom)  Luo,  “Robust  Transmit  Beamforming  for  Mul¬ 
ticasting”,  in  preparation  for  submission  to  IEEE  Trans,  on  Signal  Processing. 
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V.  Conclusions  and  Recommendations 
A.  Multiuser  Transmit  Beamforming 
A.  1  Sum  capacity  objective 

We  have  considered  two  algorithms  that  capitalize  on  multiuser  diversity  to  achieve  a  sig¬ 
nificant  fraction  of  the  multi-antenna  downlink  sum  capacity  when  the  number  of  users,  M, 
is  greater  than  the  number  of  antennas,  N.  We  have  analyzed  the  throughput  performance  of 
the  greedy  zero-forcing  dirty  paper  (gZF-DP)  algorithm  in  independent  Rayleigh  fading,  and 
characterized  the  pdf’s  of  certain  key  parameters  of  interest.  Determining  the  proper  number 
of  samples  required  for  accurate  Monte  Carlo  estimates  is  a  difficult  issue  without  a  baseline. 
While  the  end  result  of  gZF-DP  performance  analysis  requires  sequential  numerical  integration 
and  is  admittedly  cumbersome,  it  does  provide  such  a  baseline  and  thus  corroborates  the  results 
of  Monte  Carlo  estimation.  Also,  numerical  integration  is  simpler  than  Monte  Carlo  simulation 
for  a  small  number  of  transmit  antennas.  Furthermore,  our  analysis  allowed  us  to  establish  that 
at  high  SNR  the  throughput  versus  SNR  slope  of  the  gZF-DP  algorithm  is  proportional  to  N. 

We  have  also  proposed  another  low-complexity  algorithm,  dubbed  ZFS,  which  does  not  re¬ 
quire  DP  coding  at  the  transmitter.  We  have  shown  that  the  selection  procedures  of  gZF-DP 
and  ZFS  algorithms  have  the  same  complexity  order,  0(N3M),  which  is  significantly  smaller 
than  the  complexity  of  the  optimal  algorithms  when  M  »  N.  We  have  evaluated  the  through¬ 
put  performance  of  the  ZFS  algorithm  via  simulations.  The  results  show  that  for  a  realistic 
number  of  transmit  antennas,  ZFS  achieves  a  significant  fraction  of  the  throughput  of  gZF-DP 
and  sum  capacity,  at  a  low  coding  and  on-line  computation  cost.  The  simulation  results  also 
indicate  that,  at  high  SNR,  ZFS  achieves  the  same  slope  of  throughput  per  dB  of  SNR  as  the 
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capacity-achieving  strategy  based  on  the  use  of  DP  coding  for  known  interference  cancellation 
and  convex  optimization. 

Due  to  its  simplicity,  low  complexity,  and  close  to  optimal  performance,  the  proposed  ZFS 
method  offers  an  attractive  alternative  to  earlier  DP-based  methods  when  M  »  N .  ZFS  is  hard 
to  beat  from  a  performance-complexity  trade-off  point  of  view.  This  is  attributed  to  multiple  user 
selection  diversity,  which  generalizes  the  concept  of  multiuser  diversity,  due  to  Tse,  by  selecting 
to  serve  a  group  of  users,  versus  a  single  user.  We  believe  that  ZFS  has  strong  potential  of  being 
implemented  in  actual  systems  (there  is  recent  follow-up  work  by  Morgan,  Huang,  of  Bell  Labs 
/  Lucent  Technologies,  as  well  as  European  industry  R&D  groups). 

A. 2  Multicasting  under  Quality  of  Service  (QoS)  and  Max-min  Fair  (MMF)  objectives 

We  have  taken  a  new  look  at  the  broadcasting/multicasting  problem  when  channel  state  infor¬ 
mation  is  available  at  the  transmitter.  We  have  proposed  two  pertinent  problem  formulations: 
minimizing  transmitted  power  under  multiple  minimum  received  power  constraints,  and  max¬ 
imizing  the  minimum  received  power  subject  to  a  bound  on  the  transmitted  power. We  have 
shown  that  both  formulations  are  NP-hard  optimization  problems;  however,  their  solution  can 
often  be  well  approximated  using  semidefinite  relaxation  tools.  We  have  explored  the  relation¬ 
ship  between  the  two  formulations  and  also  insights  afforded  by  Lagrangian  duality  theory.  In 
view  of  i)  our  extensive  numerical  experiments  with  simulated  and  measured  data,  verifying  that 
semidefinite  relaxation  consistently  yields  good  performance,  ii)  proof  that  the  basic  problem  is 
NP-hard,  and  thus  approximation  is  unavoidable,  and  iii)  corroborating  motivation  provided  by 
duality  theory,  we  conclude  that  the  approximate  solutions  provided  herein  offer  useful  designs 
across  a  broad  range  of  applications. 
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The  downlink  beamforming  problem  was  considered  for  the  general  case  of  multiple  co¬ 
channel  multicast  groups,  under  two  design  criteria:  QoS,  in  which  we  seek  to  minimize  the 
total  transmitted  power  while  guaranteing  a  prescribed  minimum  SINR  at  all  receivers;  and  a 
fair  objective,  in  which  we  seek  to  maximize  the  minimum  received  SINR  under  a  total  power 
constraint.  Both  formulations  contain  single  group  multicast  beamforming  as  a  special  case, 
and  are  therefore  NP-hard.  Computationally  efficient  quasi-optimal  solutions  were  proposed 
by  means  of  SDR  and  a  combined  randomization  -  multi-group  multicast  power  control  loop. 
Extensive  numerical  results  have  been  presented,  using  both  simulated  (i.i.d.  Rayleigh)  and 
measured  stationary  outdoor  wireless  channel  data,  showing  that  the  proposed  algorithms  yield 
high  quality  approximate  solutions  at  a  moderate  complexity  cost.  Interestingly,  our  numeri¬ 
cal  findings  indicate  that  the  solutions  generated  by  our  algorithms  are  often  exactly  optimal, 
especially  in  the  case  of  measured  channels.  In  certain  cases  this  optimality  can  be  proven  be¬ 
forehand,  and  alternative  convex  reformulations  of  lower  complexity  have  been  constructed;  in 
other  cases,  a  theoretical  worst-case  bound  on  approximation  accuracy  has  been  derived,  and 
shown  to  be  tight. 

Whereas  multi-group  multicast  transmit  beamforming  under  SINR  constraints  is  NP-hard  in 
general,  we  have  shown  that,  in  the  special  case  of  Vandermonde  steering  vectors  it  is  in  fact  a 
semidefinite  program,  which  can  be  efficiently  solved.  We  have  also  considered  robust  beam¬ 
forming  solutions  under  channel  uncertainty  for  the  case  of  a  single  multicast  group.  For  general 
steering  vectors,  we  have  shown  that  exact  solutions  of  the  robust  and  non-robust  versions  of  the 
problem  are  related  via  a  simple  one-to-one  scaling  transformation.  Since  both  problems  are 
NP-hard,  this  suggests  an  algorithm  to  generate  a  quasi-optimal  solution  for  one  given  a  quasi- 
optimal  solution  for  the  other.  In  the  important  special  case  of  Vandermonde  steering  vectors, 
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we  have  shown  that  the  robust  version  of  the  problem  is  convex  as  well.  This  robust  solution  can 
be  extended  to  the  multi-group  Vandermonde  case. 

B.  Multiple  Input  Multiple  Output  Decoding 

We  have  presented  a  two-stage  hybrid  PDA-SD  algorithm  for  signal  detection  in  MIMO  sys¬ 
tems.  The  basic  idea  is  dimensionality  reduction  via  hard  decoding  and  cancellation  of  those 
symbols  that  can  be  quickly  and  reliably  detected  via  a  single  PDA  stage.  In  the  V-BLAST 
scenario  considered,  simulations  show  that  the  proposed  hybrid  algorithm  attains  performance 
close  to  SD,  at  a  complexity  close  to  PDA.  The  dimensionality  reduction  idea  can  also  be  applied 
in  conjunction  with  other  variants  of  SD  or  SDR. 

We  have  also  proposed  a  new  SDR  approach  for  MIMO  detection  of  high-order  QAM  constel¬ 
lations.  The  new  approach  is  the  simplest  one  in  the  class  of  SDR  detectors  for  high-order  QAM: 
its  worst-case  complexity  is  nearly  cubic  in  the  dimension  of  the  transmitted  symbol  vector,  and 
independent  of  the  constellation  order  for  uniform  QAM  /  affine  in  the  constellation  order  for 
non-uniform  QAM.  Under  certain  conditions,  the  new  approach  affords  significant  improve¬ 
ments  in  SER  over  prior  methods.  Specifically,  the  Sphere  Decoder  (SD)  family  of  detectors 
exhibits  a  threshold  behavior:  it  either  works  very  well  (for  low-enough  symbol  vector  dimen¬ 
sion,  order  of  the  individual  symbol  constellation,  and  high-enough  SNR)  or  it  freezes.  The 
threshold  between  the  two  regimes  depends  on  a  combination  of  these  three  factors.  When  SD 
works,  it  outperforms  SDR  in  terms  of  complexity  and  SER  performance.  In  difficult  scenarios, 


SDR  offers  an  attractive  alternative  relative  to  earlier  solutions. 
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C.  Acquiring  Channel  State  Information:  Node  Localization 

We  have  proposed  a  hybrid  two-stage  node  localization  algorithm  that  offers  better  accuracy 
than  existing  alternatives  of  the  same  (and,  in  certain  cases,  even  higher)  complexity  order.  The 
new  algorithm  employs  Fastmap,  coupled  with  judicious  selection  of  anchor  nodes  that  double 
as  pivots,  to  generate  a  computationally  cheap  yet  sufficiently  accurate  initialization  for  gradi¬ 
ent  descent.  The  new  algorithm  is  particularly  attractive  (in  terms  of  the  offered  performance- 
complexity  trade-off)  in  the  case  of  dense  networks. 

We  also  proposed  using  our  adaptation  of  Fastmap  as  initialization  for  Costa’s  algorithm.  The 
latter  combination  appears  useful  for  sparse  networks,  in  which  case  it  attains  better  estimation 
performance  than  Fastmap  followed  by  steepest  descent  (SD),  albeit  at  a  higher  complexity  cost. 
Our  simulations  indicate  that,  in  the  context  of  our  present  application,  Fastmap+SD  uniformly 
outperforms  the  classical  principal  component  analysis  (PCA)-based  multi-dimensional  scaling 
(MDS)  algorithm,  both  in  terms  of  complexity  and  in  terms  of  estimation  accuracy.  We  have  also 
derived  the  pertinent  CRB  for  the  log-normal  multiplicative  measurement  noise  model,  which 
was  adopted  for  most  of  our  simulations. 

D.  Synchronization,  Doppler,  and  Intercept  Issues:  Particle  Filtering  Tools 

We  have  developed  three  new  particle  filtering  algorithms  for  tracking  a  frequency-hopped 
complex  sinusoid,  based  on  a  novel  stochastic  state-space  formulation.  The  algorithms  range 
from  a  plain- vanilla  version  that  uses  the  prior  importance  function,  to  a  more  advanced  version 
that  employs  the  optimal  importance  function,  and,  finally,  an  improvement  of  the  latter  using 
a  problem-specific  outer  rejection  loop.  The  two  latter  algorithms  afford  considerably  better 
performance  -  complexity  trade-offs. 
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We  also  revisited  the  important  problem  of  tracking  a  single  time-varying  harmonic,  whose 
frequency  and  complex  amplitude  evolve  according  to  a  linear  Gaussian  separable  AR(1)  model. 
A  key  difficulty  in  treating  this  model  comes  from  the  nonlinear  measurement  equation.  For 
this  model,  we  derived  the  optimal  importance  function  in  closed  form.  This  yields  interesting 
insights  and  opens  up  the  possibility  of  designing  particle  filters  that  are  more  efficient  than 
generic  ones.  We  also  derived  a  procedure  to  sample  from  this  optimal  importance  function, 
using  rejection  and  the  concept  of  a  dominating  density.  Our  numerical  experiments  comparing 
the  resulting  filter  to  standard  particle  filters  and  the  CRB  show  that  the  proposed  PF  algorithm 
has  merits,  particularly  in  terms  of  reducing  the  number  of  particles,  and  therefore  memory 
requirements  as  well. 
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Abstract 

The  problem  of  transmit  beamforming  to  multiple  co-channel  multicast  groups  is  considered, 
from  two  viewpoints:  minimizing  total  transmission  power  while  guaranteeing  a  prescribed  minimum 
signal-to-interference-plus-noise-ratio  (SINR)  at  each  receiver;  and  a  “fair”  approach  maximizing  the 
overall  minimum  SINR  under  a  total  power  budget.  The  core  problem  is  a  multicast  generalization 
of  the  multiuser  downlink  beamforming  problem;  the  difference  is  that  each  transmitted  stream 
is  directed  to  multiple  receivers,  each  with  its  own  channel.  Such  generalization  is  relevant  and 
timely,  e.g.,  in  the  context  of  802. 16e  wireless  networks.  The  joint  problem  also  contains  single 
group  multicast  beamforming  as  a  special  case.  The  latter  (and  therefore  also  the  former)  is  NP- 
hard.  This  motivates  the  pursuit  of  computationally  efficient  quasi-optimal  solutions.  It  is  shown 
that  Lagrangian  relaxation  coupled  with  suitable  randomization  /  co-channel  multicast  power  control 
loops  yield  computationally  efficient  high-quality  approximate  solutions.  For  a  significant  fraction  of 
problem  instances,  the  solutions  generated  this  way  are  exactly  optimal.  Extensive  numerical  results 
using  both  simulated  and  measured  wireless  channels  are  presented  to  corroborate  our  main  findings. 
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I.  Introduction 

The  proliferation  of  streaming  media  (digital  audio,  video,  IP  radio),  peer-to-peer  services,  large- 
scale  software  updates,  and  profiled  newscasts  over  the  wireline  Internet  has  brought  renewed  interest 
in  multicast  routing  protocols.  These  protocols  were  originally  conceived  and  have  since  evolved 
under  the  “wireline  premise”:  the  physical  network  is  a  graph  comprising  point-to-point  links  that  do 
not  interfere  with  each  other  at  the  physical  layer.  Today,  multicast  routing  protocols  operate  at  the 
network  or  application  layer,  using  either  controlled  flooding  or  minimum  spanning  tree  access. 

As  wireless  networks  become  ever  more  ubiquitous,  and  wireless  becomes  the  choice  for  not  only 
the  “last  hop”  but  also  suburban-  and  metropolitan-area  backbones,  wireless  multicasting  solutions  are 
needed  to  account  for  and  exploit  the  idiosyncracies  of  the  wireless  medium.  Wireless  is  inherently 
a  broadcast  medium,  where  it  is  possible  to  reach  multiple  destinations  with  a  single  transmission; 
different  co-channel  transmissions  are  interfering  with  one-another  at  the  intended  destination(s);  and 
links  are  subject  to  fading  and  shadowing,  in  addition  to  co-channel  interference. 

The  broadcast  advantage  of  wireless  has  of  course  been  exploited  since  the  early  days  of  radio.  The 
interference  problem  was  dealt  with  by  allocating  different  frequency  bands  to  the  different  stations, 
and  transmission  was  mostly  isotropic  or  focused  towards  a  specific  service  area. 

Today,  the  situation  with  wireless  networks  is  much  different.  First,  transmissions  need  not  be 
“blind”.  Many  wireless  network  standards  provision  the  use  of  transmit  antenna  arrays.  Using  baseband 
beamforming,  it  is  possible  to  steer  energy  in  the  direction(s)  of  the  intended  users,  whose  locations 
(or,  more  generally,  channels)  can  often  be  accurately  estimated.  Second,  the  push  towards  higher 
capacity  and  end-user  rates  necessitates  co-channel  transmission  which  exploits  the  spatial  diversity 
in  the  user  population  ( spatial  multiplexing).  Third,  quality  of  service  is  an  important  consideration, 
especially  in  wireless  backhaul  solutions  like  802. 16e.  Finally,  due  to  co-channel  interference,  wireless 
multicasting  cannot  be  dealt  with  in  isolation,  one  group  at  a  time;  a  joint  solution  is  needed. 

The  problem  of  transmit  beamforming  towards  a  (single)  group  of  users  was  first  considered  in 
the  Ph.D.  thesis  of  Lopez  [9],  using  the  averaged  (over  all  users  in  the  group)  received  Signal  to 
Noise  Ratio  (SNR)  as  the  design  criterion.  The  solution  boils  down  to  a  relatively  simple  eigenvalue 
problem,  but  no  SNR  guarantee  is  provided  this  way:  some  users  may  get  really  poor  SNR  [11],  This 
is  not  acceptable  in  multicasting  applications,  because  it  is  the  worst  SNR  that  determines  the  common 
information  rate.  Quality  of  service  (providing  a  guaranteed  minimum  received  SNR  to  every  user) 
and  max-min-fair  (maximizing  the  smallest  received  SNR)  designs  were  first  proposed  in  [10],  [11], 
where  it  was  shown  that  the  core  problem  is  NP-hard,  yet  high-quality  approximate  solutions  can  be 
obtained  using  relaxation  techniques  based  on  semidefinite  programming  (SDP).  The  latter  is  a  class 
of  convex  optimization  problems  which  can  be  solved  in  polynomial  time  by  powerful  interior  point 
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methods. 

As  already  mentioned,  designing  a  transmit  beamformer  separately  for  each  multicast  group  can 
be  far  from  optimal,  due  to  inter-group  interference.  In  this  paper,  we  consider  the  joint  design 
problem  under  quality  of  service  and  max-min-fair  criteria.  In  addition  to  semidefinite  relaxation 
ideas,  our  solutions  entail  a  co-channel  multicast  power  control  component,  which  can  be  viewed  as 
a  generalization  of  multiuser  power  control  ideas  for  the  cellular  downlink  (e.g.,  see  [3]  and  references 
therein).  The  multiuser  downlink  beamforming  problem  (e.g.,  see  [1]  and  references  therein)  can  be 
viewed  as  a  special  case  of  our  formulation,  where  each  multicast  group  consists  of  a  single  receiver. 

A  carefully  designed  suite  of  numerical  results  is  used  to  demonstrate  the  efficacy  of  our  designs, 
including  extensive  results  using  measured  wireless  channel  data. 

II.  Data  Model  and  Problem  Statement 

Consider  a  wireless  scenario  incorporating  a  single  transmitter  with  N  antenna  elements  and  M 
receivers,  each  with  a  single  antenna.  Let  h,;  denote  the  Ar  x  1  complex  vector  that  models  the 
propagation  loss  and  phase  shift  of  the  frequency-flat  quasi-static  channel  from  each  transmit  antenna 
to  the  receive  antenna  of  user  i  e  { 1 , . . .  ,  M}.  Let  there  be  a  total  of  1  <  G  <  M  multicast 
groups,  {Q  i,...  .  Gc\,  where  Qk  contains  the  indices  of  receivers  participating  in  multicast  group 
k,  and  k  £  {1,...  ,  G}.  Each  receiver  listens  to  a  single  multicast;  thus  Qk  Cl  Qi  =  0,  l  ^  k, 
U kQk  =  {!)•••  ,M},  and,  denoting  Gk  :=  \Qk\,  Y!k=\  Gk  =  M. 

Let  w/,.  denote  the  beamforming  weight  vector  applied  to  the  N  transmitting  antenna  elements  to 
generate  the  spatial  channel  for  transmission  to  group  k  (see  Fig.  1).  Then  the  signal  transmitted  by 
the  antenna  array  is  equal  to  J2k=iwk  sk{t),  where  sk(t)  is  the  temporal  information-bearing  signal 
directed  to  receivers  in  multicast  group  k.  Note  that  the  above  setup  includes  the  case  of  broadcasting 
(a  single  multicast  group,  G  =  1)  [11],  as  well  as  the  case  of  individual  information  transmission  to 
each  receiver  ( G  =  M)  by  means  of  spatial  multiplexing  (see,  e.g.,  [1]).  If  each  sk{t)  is  zero-mean, 
temporally  white  with  unit  variance,  and  the  waveforms  {sk(t)}k=1  are  mutually  uncorrelated,  then 
the  total  power  radiated  by  the  transmitting  antenna  array  is  equal  to  Ylk= l  llwfclll- 

The  joint  design  of  transmit  beamformers  can  then  be  posed  as  the  problem  of  minimizing  the 
total  radiated  power  subject  to  meeting  prescribed  SINR  constraints  7 j  at  each  of  the  M  receivers 


Q  : 

G 

min  \ 

{wt£C"}°=1 

wfc  1 1 2 

s.t.  : 

|w^h;|2  ^  yj- 

,  Vfc  E{1,.. 

•  ,G}. 
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Problem  Q  contains  the  associated  broadcasting  problem  (G  =  1)  as  a  special  case;  from  this  and 
[11],  it  immediately  follows  that 
Claim  1:  Problem  Q  is  NP-hard. 

This  motivates  (cf.  [4])  the  pursuit  of  sensible  approximate  solutions  to  problem  Q.  1 

III.  Relaxation 

Towards  this  end,  let  us  define  Q,  :=  hjh^  and  X/,.  :=  w&w^,  and  note  that  w///h,|“  = 
h^WfcW^hj  =  trace(hffw(!cw^hj)  =  trace  (h/hfw^w^)  =  trace(Q;Xfc).  Note  that  Xfc 
for  some  w/.  G  CN  if  and  only  if  X/,.  y  0  and  rank(X/i.)  =  1.  It  follows  that  problem 
equivalently  reformulated  as 

G 

min  N  trace(Xfc) 

{x,eC"*»}«=1 

s.t.  :  trace(Q,;X/c)  >  7*  ^  trace(Q.tXz)  +  7 jof , 

l^k 

Vi  £  Gk,  VA:  G  {1, . . .  ,  G}, 

X&  ^0,  Vfc  G  {1, . . .  ,  G}, 

rank(X^)  =  1,  Vfc  G  {1, . . .  ,  G}, 

where  the  fact  that  the  terms  in  the  denominator  are  all  nonnegative  has  also  been  taken  into  account. 
Dropping  the  last  G  rank-one  constraints,  which  are  nonconvex,  we  arrive  at  the  following  relaxation 
of  problem  Q 


Qr  : 

G 

min 

{Sie R}™=1 

y  trace  (X^.) 

k=l 

s.t.  :  trace(QjXfc)  -7 jV  trace(QjX;)  -  st  =  7 , 

l^k 

Vi  G  Gki  Vfc  G  {1, 

...  ,  G}, 

Si  >0,  Vi  G  {1, . 

■■  ,M}, 

Xfe  y  0,  \/k  G  {1, 

...  ,G}, 

where  M  nonnegative  real  “slack”  variables  st  have  been  introduced,  in  order  to  convert  the  first  M 
linear  inequality  constraints  to  M  linear  equality  constraints,  plus  M  nonnegativity  constraints. 

'Note  that  other  special  cases  of  problem  Q  are  not  NP-hard:  e.g.,  the  multiuser  downlink  beamforming  problem  (G  =  M) 
is  a  Second  Order  Cone  Program  (SOCP)  [1];  see  also  [7]  for  a  restriction  on  the  channel  vectors  that  enables  convex 
reformulation  and  thereby  efficient  solution  of  the  problem. 


=  wfcwf 
<2  can  be 
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Next,  we  seek  to  express  the  equality  constraints  as  linear  combinations  of  the  unknown  vector 
x  =  [vec(Xi)T  •  •  •  vec(X.c)T]T,  which  is  formed  by  stacking  the  columns  of  the  X/.  matrices. 
Towards  this  end,  the  Gxl  vectors 

a i  =  (7 i  +  l)efc(i)  -  7 ilG,  V*  €  {1, . . .  ,M}, 

are  introduced,  whose  k(i)- th  element  is  equal  to  one,  whereas  all  others  are  set  to  —7 Here,  ek^ 
is  the  Gxl  vector  indicating  the  multicast  group  k(i)  that  user  i  belongs  to,  and  1q  is  the  Gxl 
all-ones  vector.  Using  a,  we  can  recast  the  equality  constraints  as 

[a*  (8)  vec(Qf  )]Tx  -  =  7 *of ,  Vi  <G  {1, ...  ,  M}, 

where  <g>  denotes  the  Kronecker  product.  Finally,  the  relaxed  problem  Qr  is  written  as 


Qr  ■ 

min  [1  q  <8  vec(I/v)]Tvec(x) 

xeC®2,  GiSR}"! 

s.t.  :  [a*  <8  vec(Qf  )]rx  - 

Si  =  JiCTi, 

Vi  G  Qki  Vk  £  {1, 

...  , G}, 

Si  >0,  Vi  €  {1, . 

..  ,  M} 

Xfc  t  0,  Vk  G  {1, 

...,G}. 

Here,  Ijv  is  the  identity  matrix  of  size  N  x  N.  Problem  Qr  is  a  Semi-Definite  Program  (SDP),  ex¬ 
pressed  in  the  primal  standard  form  used  by  SDP  solvers,  such  as  SeDuMi  [12].  This  SDP  has  G  matrix 
variables  of  size  N  xN,  and  M  linear  constraints.  Interior  point  methods  will  take  0{\/GN  log(l/e)) 
iterations,  with  each  iteration  requiring  at  most  0(G3N6  +  MGN2)  arithmetic  operations,  where  the 
parameter  e  represents  the  solution  accuracy  at  the  algorithm’s  termination.  SeDuMi  uses  interior 
point  methods  to  solve  such  SDP  problems  efficiently.  Actual  runtime  complexity  will  usually  scale 
far  slower  with  G,  N,  M  than  this  worst-case  bound. 

IV.  Obtaining  an  Approximate  Solution  to  Problem  Q 
Problem  Q  may  not  admit  a  feasible  solution  (counter-examples  may  be  easily  constructed),  but  if  it 
does,  the  aforementioned  approach  will  yield  a  solution  to  problem  Qr.  Due  to  relaxation,  this  solution 
will  not,  in  general,  consist  of  rank-one  blocks.  In  order  to  obtain  a  high-quality  approximate  solution 
of  problem  Q,  the  concept  of  randomization  can  be  employed  to  generate  candidate  beamforming 
vectors  in  the  span  of  the  respective  transmit  covariance  matrices.  The  main  difference  relative  to 
the  simpler  broadcast  case  (G  =  1)  considered  in  [11],  is  that  here  we  cannot  simply  “scale  up” 
the  candidate  beamforming  vectors  generated  during  randomization  to  satisfy  the  SINR  constraints 
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of  problem  Q.  The  reason  is  that,  in  contrast  to  [11],  we  herein  deal  with  an  interference  scenario, 
and  boosting  one  group’s  beamforming  vector  also  increases  interference  to  nodes  in  other  groups. 
Whether  it  is  feasible  to  satisfy  the  constraints  for  a  given  set  of  candidate  beamforming  vectors  is 
also  an  issue  here.  Let  :=  w^h,  |2  denote  the  signal  power  received  at  receiver  i  from  the  stream 
directed  towards  users  in  multicast  group  k.  Let  (3k  '■=  ||wfc|||,  and  pi-  denote  the  power  boost  (or 
reduction)  factor  for  multicast  group  k.  Then  the  following  Multi-Group  Power  Control  ( MGVC ) 
problem  emerges  in  converting  candidate  beamforming  vectors  to  a  candidate  solution  of  problem  Q. 


MG  VC  : 


mm 

{pfceM}^= 


G 

y. 

k=  1 


PkPk 


^  Pkak ,  >  7 u 

T,i^kPia.i,i+a/  -  '*> 

Vi  E  Gki  V/c  E  {1, . . .  ,  G }, 
Pk  >  0,  Vk  E  {1, . . .  ,G}. 


As  in  Section  III,  taking  advantage  of  the  fact  that  the  terms  in  the  denominator  are  all  nonnegative 
and  introducing  M  nonnegative  real  “slack”  variables  .7,  problem  M.QTC  can  be  equivalently 
reformulated  as 


MGVC  : 


G 


mm 


{pfcSM}fc=1,  (si 


s.t.  : 


i=1  fc=i 

Pk&k,i  T*  ^  'jPlQ'lyi  'li&i  i 

l^k 

Vi  E  Gki  Vfc  E  {1, . . .  ,  G}, 


Pk  >  0,  \/k  E  {1, . . .  ,G}, 
Si  >  0,  Vi  E  {1, . . .  ,  M}. 


Towards  transforming  the  MGVC  problem  formulation  to  the  primal  standard  form  used  by  convex 
optimization  problem  solvers,  such  as  SeDuMi,  we  denote  (3  =  [f3\, . . .  ,(3g]T ,  P  =  [pi,  ■  ■  ■  ,Pg]T \ 
and  cxi  =  [ai,i, . . .  ,  We  can  now  recast  problem  MGVC  as 
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MG  VC  : 

min  flTp 

PeRG,{s.6iR}“i 

s.t.  :  [a*  0  a.i]T p  -  s*  =  7 jof , 

Vi  G  Gk,  V/c  G  {1, . . .  ,  G}  , 

Pk  >0,  VfcG  {1,...  ,G}, 

5,  >  0,  Vi  G  {1, . . .  ,M}  , 

where  a,  are  the  G  x  1  vectors  introduced  in  Section  III  and  ©  stands  for  element-wise  multiplication 
(the  Hadamard  product).  Problem  MGVC  is  a  Linear  Program  (LP)  with  G  nonnegative  variables 
and  M  linear  inequality  constraints.  Interior  point  methods  can  either  find  the  problem  infeasible  or 
generate  an  e-optimal  solution  in  0{\/~G log(l/e))  iterations,  each  requiring  at  most  0(G3  +  MG) 
arithmetic  operations.  SeDuMi  can  be  used  to  find  its  optimum  solution.  Note  that  SeDuMi  will 
also  yield  an  infeasibility  certificate  in  case  the  MGVC  problem  is  not  solvable  for  a  particular 
beamforming  configuration.  This  is  useful  to  determine  the  feasibility  of  a  candidate  beamforming 
configuration. 

For  G  =  M  (independent  information  transmission  to  each  receiver),  problem  Qr  is  in  fact 
equivalent  to  (not  a  relaxation  of)  problem  Q,  see  [1];  likewise,  problem  MGVC  reduces  to  the 
well-known  multiuser  downlink  power  control  problem,  which  can  be  solved  using  simpler  means 
(e.g.,  [3]):  matrix  inversion  and  iterative  descent  algorithms.  In  this  special  case,  (in)feasibility  can 
be  determined  from  the  spectral  radius  of  a  certain  “connectivity”  matrix.  Similar  simplifications 
for  the  general  instance  of  MGVC  are  perhaps  possible,  but  appear  highly  non-trivial.  At  any  rate, 
interior  point  LP  routines  are  very  efficient,  hence  this  is  not  a  major  issue.  The  overall  algorithm 
for  obtaining  an  approximate  solution  to  problem  Q  can  be  summarized  as  follows: 

1)  Relaxation:  Solve  problem  Qr,  using  a  SDP  solver  (e.g.  SeDuMi).  Denote  the  solution  {Xfc}^=1. 

2)  Randomization  /  Scaling  Loop:  For  each  k,  generate  a  vector  in  the  span  of  X&,  using  the 
Gaussian  randomization  technique  (randC)  in  [11].  If,  for  some  k,  rank(Xfc)  =  1,  then  use 
the  principal  component  instead.  Next,  feed  the  resulting  set  of  candidate  beamforming  vectors 
{w 1  into  problem  MGVC  and  solve  it  using  SeDuMi.  If  the  particular  instance  of  MGVC 
is  infeasible  or  yields  a  larger  MGVC  objective  than  previously  checked  candidates,  discard 
the  proposed  set  of  candidate  beamforming  vectors;  else,  record  the  solution  and  associated 
objective  value. 

The  quality  of  approximate  solutions  to  problem  Q  generated  this  way  can  be  checked  against  the 
lower  bound  on  transmit  power  obtained  in  solving  problem  Qr.  This  bound  can  be  further  motivated 
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from  a  duality  perspective,  as  in  [11];  that  is,  the  aforementioned  relaxation  lower  bound  is  in  fact 
the  tightest  lower  bound  on  the  optimum  of  problem  Q  attainable  via  Lagrangian  duality  [2],  This 
follows  from  arguments  in  [13]  (see  also  the  single-group  case  in  [11]),  due  to  the  fact  that  problem 
Q  is  a  quadratically  constrained  quadratic  program. 


V.  Joint  Max-Min  Fair  Beamforming 


In  this  section,  we  consider  the  related  problem  of  maximizing  the  minimum  SINR,  received  by 
any  of  the  M  intended  users  irrespective  of  the  multicast  group  they  belong  to,  subject  to  an  upper 
bound  P  on  the  total  transmitted  power.  This  problem  formulation  is  a  generalization  of  the  respective 
max-min  fair  transmit  beamforming  problem  towards  a  single  multicast  group,  which  was  considered 
in  [11].  The  key  difference  is  that  here  we  seek  to  maximize  a  SINR,  instead  of  a  SNR;  that  is,  the 
beamforming  vectors,  which  are  to  be  optimized,  appear  in  the  numerator  as  well  as  in  the  denominator 
of  the  objective  function.  Specifically,  the  joint  max-min  fair  (JMMF)  transmit  beamforming  design 
is  formulated  as 


T  : 

| „  HV  |2 

max  min  min 

|wfc  h»| 

£#*  lwf  h*l2  + 

{wfcSCMjL,  fce{i,...  ,G} 

G 

s.t.  :  ^||wfe||2<P. 

k= 1 

Since  problem  T  contains  as  a  special  case  the  associated  broadcasting  problem  (G  =  1),  it  follows 
from  [11]  that 

Claim  2:  Problem  T  is  NP-hard. 

The  inequality  constraint  on  the  total  transmit  power  will  be  met  with  equality  at  an  optimum. 
Otherwise,  one  could  multiply  all  beam  vectors  by  a  constant  c  >  1,  thereby  increasing  the  minimum 
SINR  (note  that  a f  >  0).  We  may  therefore  focus  on  the  equality  constrained  problem  and  denote 
this  as  T  from  now  on. 

Claim  2  motivates  the  pursuit  of  sensible  approximate  solutions  to  the  JMMF  problem.  Towards 
this  end,  we  introduce  an  auxiliary  positive  real  variable  t  and  rewrite  the  (equality  constrained) 
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JMMF  downlink  beamforming  problem  T  as  follows 


max  t 

{wfceCN}°=1,  tm 


s.t.  : 


Vk 


Iwfhd2 

G  {1, . . .  ,  G},  Vi  E  Qk, 


G 

||wfc|||  =  P,  and  t  >  0. 

fc=i 


Then,  using  the  matrices  Q,  and  X/,.  introduced  in  Section  III,  we  can  further  recast  problem  T  as 

max  t 

{xk£CNxN}^=1,  tm 

.  trace(Q,;Xfc)  > 

'  trace(QiX£)  +  of  “ 

VA  E  {1, . . .  ,  G},  Vi  E  Qki 
G 

trace  (Xfc)  =  P, 

k=l 

rank(Xfc)  =  1,  VA  E  {1, . . .  ,  G}, 

Xfc  V  0,  VA  E  {1, . . .  ,  G},  and  f  >  0. 


Finally,  dropping  the  nonconvex  rank  constraints  we  obtain  the  following  relaxation  of  the  original 
problem  P 


Pr  : 


max  t 

{xfceCNx«}j3=1,  jgM 


s.t.  :  trace(QjXfc)  —  t  trace(QjX^)  +  of  >  0 


l+k 


Vfc  E  {1,...  , G},  Vi  E  Qki 
G 

y~~]  trace(X^.)  =  P, 
k= 1 


X*.  >z  0,  VA:  E  {1, . . .  ,  G},  and  t  >  0, 
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where  we  have  also  taken  into  account  the  fact  that  the  terms  in  the  denominators  of  the  first 
M  inequality  constraints  are  all  nonnegative.  Problem  Tr  has  a  linear  objective  function,  1  linear 
equality  constraint,  G  positive  semidefinite  constraints,  and  1  nonnegativity  constraint;  however,  it  is 
nonconvex,  due  to  the  first  M  nonlinear  inequality  constraints. 

A  solution  to  the  relaxed  problem  Tr  can  be  found  by  means  of  bisection  over  SDP  problems,  as 
explained  next.  Let  t*  be  the  optimum  value  of  problem  Tr.  A  feasible  solution  of  Tr  that  is  at  most 
e  >  0  away  from  t*  can  be  generated  as  follows.  Let  [L,  U ]  be  an  interval  containing  t* .  We  begin  by 


setting  L  =  0,  U  =  P 


mm 


IV,. II2 
| rii  1 1 2 


ie{l,...,M}  CTZ 


,  where  the  lower  bound  follows  from  non-negativity  of  t*  and 


the  upper  bound  follows  from  the  Cauchy-Schwartz  inequality.  Given  [L,U],  the  convex  feasibility 
problem  TV,  shown  in  the  box  below,  is  solved  at  the  midpoint  t  =  (L  +  U)/ 2  of  the  interval.  If 
problem  TV  is  feasible  for  the  given  choice  of  t,  we  set  L  :=  t;  otherwise  U  :=  t.  Thus,  in  each 
iteration  the  interval  is  halved.  Repeating  until  U  —  L  <  e  requires  only  Aitei-  =  |~log2((17  —  L)/e)~| 
iterations.  In  practice,  10-12  iterations  are  usually  enough  for  typical  problem  setups. 

The  convex  feasibility  problem  TV  is  formulated,  for  any  choice  of  the  positive  real  t,  as 


TV  : 

find  v 

s.t.  :  trace(QjXfc)  —  t  trace(QiXi)  —  s,  =  taf, 

e^k 

Vfc  6  {1, . . .  ,  G},  Vz  G  Qk, 

G 

trace(Xfc)  =  P, 

k= l 

Xfc  y  o,  \/k  €  {i, . . . ,  G}, 

Si  >0,  Vz  €  {1, . . .  ,M}, 

where  M  nonnegative  real  “slack”  variables  st  have  been  introduced  to  convert  the  linear  inequality 
constraints  to  linear  equality  plus  nonnegativity  constraints.  Here,  v  G  Mm  x  CGN~  denotes  the 
variable  vector  v  =  [sT  vec(X  i  )r  ■  ■  ■  vec(Xr;)r]7  ,  where  the  vector  s  =  [si  •  •  •  sm]T  contains  the 
“slack”  variables.  The  feasibility  problem  TV  is  comprised  of  an  objective  function,  set  to  zero,  and 
M+ 1  linear  equality  constraints,  G  positive  semidefinite  constraints,  and  M  nonnegativity  constraints. 
Hence,  it  is  a  SDP  problem  expressed  in  the  standard  primal  form.  Thus,  for  each  iteration  of  the 
aforementioned  bisection  algorithm,  problem  TV  can  be  efficiently  solved  by  SDP  solvers.  Similar 
to  problem  Qr,  this  SDP  feasibility  problem  has  G  matrix  variables  of  size  N  x  N,  and  M  + 1  linear 
constraints.  So  computing  an  e-feasible  solution  by  an  interior  point  method  will  have  an  overall 
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iteration  count  of  0{y/GN log(l/e)),  while  each  iteration  has  a  complexity  of  0(G3N 6  +  MGN2). 
The  use  of  SeDuMi  in  the  algorithm  is  convenient,  because  it  does  not  only  yield  a  solution  to 
problem  FT  when  the  latter  is  feasible,  but  it  also  provides  a  certificate  of  infeasibility  otherwise. 
As  with  problem  Qr,  actual  runtime  complexity  will  usually  scale  far  slower  with  G,  N,  M  than 
this  worst-case  bound. 

When  the  algorithm  terminates,  the  solution  vector  v,  obtained  by  the  last  feasible  iteration,  contains 
the  approximate  solution  to  the  relaxed  problem  Fr,  namely  the  blocks  {Xfc}^=1.  The  corresponding 
(approximate)  optimal  value  of  problem  Fr  is  an  upper  bound  on  the  guaranteed  received  SINR 
by  all  users,  that  can  be  achieved  with  total  transmit  power  P.  This  bound  can  only  be  met  in  the 
case  when  all  blocks  X&  are  rank-one,  so  that  their  principal  components  can  be  chosen  as  optimum 
beamforming  vectors  w/,:.  Due  to  the  relaxation  of  the  rank  constraints,  this  is  generally  not  true. 
Thus,  post-processing  of  the  relaxed  solution  is  needed  when  the  solution  matrices  { X t  are 
not  all  rank-one,  so  as  to  yield  an  approximate  solution  to  the  original  joint  max-min  fair  problem 
F.  This  can  be  accomplished  by  using  a  combined  randomization  -  joint  power  control  procedure, 
similar  to  the  one  described  in  Section  IV.  Specifically,  Gaussian  randomization  (e.g.,  see  [11])  may 
be  used  in  a  first  step  to  create  candidate  sets  of  beamforming  vectors  {wfc}®=1  in  the  span  of  the 
respective  transmit  covariance  matrices.  In  a  second  step,  the  available  transmit  power  P  is  allocated 
to  the  candidate  beamforming  vectors,  by  adjusting  the  power  boost  (or  back-off)  factors  pf~  for 
each  multicast  group.  The  set  of  (pk,Wk)  pairs  which  maximizes  the  minimum  received  SINR  is 
then  chosen  among  all  feasible  solutions  generated  this  way.  Given  a  candidate  set  of  beamforming 
vectors,  the  transmit  power  can  be  optimally  allocated  by  solving  the  following  problem 


MQTC  : 

max  min  min 

{PfcGRJSU  fce{  1’-’°} ieGk 

Pk&k,i 

He^kPeaG  +  ai 

G 

s.t.  :  J2PkPk  =  P, 

k=  1 

Pk  >  0,  Vfc  <E  {1, . 

•  ,G}, 

where,  as  introduced  in  Section  IV,  /?*.  =  1 1 w^. || |  and  q/,,;  denotes  the  signal  power  received  by  user 
i  from  the  stream  directed  towards  users  in  multicast  group  k.  Introducing  a  real  positive  auxiliary 
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variable  7,  we  can  recast  problem  M.QVC'  as 


max  7 
1,  76K 


s.t. 


Pk^k,i 


>  7, 


V/c  G  {1, . . .  ,  G},  Vi  G 
G 

Y  PkPk  =  P , 

Jfc=l 


pk  >  0,  Vfc  £  {1,...  ,G},  and  7  >  0. 


The  bisection  algorithm,  described  earlier  in  this  section,  can  be  used  again  to  obtain  a  solution  to 
problem  M.QVC' .  The  search  interval  is  bounded  below  by  L  =  0,  as  before.  However,  we  may  now 
further  restrict  the  upper  bound  U  to  the  optimal  value  obtained  for  the  relaxed  problem  Tr.  The 
convex  feasibility  problem,  which  is  to  be  solved  in  each  iteration  for  a  given  choice  of  the  positive 
real  7,  is 


TV  : 

find  v7 

s.t.  :  ak,iPk  ~  7  at,iPt  ~  si  =  7°f ! 

V/c  €E  {1,  •  •  ■  ,  G}, 

Vi  G  Qk, 

cC 

II 

°wi 

Pk  >  0,  V/c  G  {1, 

...  , G}, 

Si  >0,  Vi  G  {1, . 

••  ,M}, 

where  v7  G  MA/+G  denotes  the  variable  vector  v'  =  [s7  pT]T.  Problem  TV'  is  a  linear  feasibility 
problem  with  G  nonnegative  variables  and  M  +  1  linear  inequality  constraints.  An  interior  point 
method  can  generate  either  an  e-feasible  solution  in  0(VG log(l/e))  iterations,  each  requiring  at  most 
0(G 3  +  MG)  arithmetic  operations,  or  return  a  dual  certificate  showing  the  problem  is  infeasible. 
When  the  bisection  algorithm  terminates,  the  solution  vector  v7  obtained  in  the  last  feasible  iteration 
contains  the  boost  /  attenuation  factors  which  optimally  allocate  the  available  transmit  power  among 
the  G  multicast  groups,  for  the  given  set  of  candidate  beamforming  vectors.  If  this  set  of  (p/c,  w^)  pairs 
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yields  larger  worst-case  received  SINR  than  previously  checked  sets,  then  it  is  recorded;  otherwise  it 
is  discarded. 

Using  the  algorithm  described  so  far,  the  cost  of  finding  an  approximate  solution  to  the  joint  max- 
min  fair  beamforming  problem  is  that  of  solving  Arjter  SDP  and  A>andA('er  LP  feasibility  problems, 
where  Ar/ter  are  the  iterations  of  the  bisection  executed  for  the  solution  of  the  M.QTC'  problem. 

The  quality  of  the  final  approximate  solution  can  be  measured  by  the  ratio  of  the  optimal  value 
of  problem  Tr  (which,  as  mentioned  already,  is  actually  an  upper  bound)  to  the  maximum  attained 
optimal  value  of  problem  M.QVC' . 

VI.  Numerical  results 

A.  QoS  Approach 

In  Sections  III  and  IV  we  have  derived  a  two-step  algorithm  to  yield,  in  polynomial  time,  an 
approximate  solution  to  the  joint  QoS  multicast  beamforming  problem  Q.  The  first  step  of  the  proposed 
algorithm  consists  of  a  relaxation  of  the  original  problem  Q  to  problem  Qr.  The  original  problem  Q 
may  or  may  not  be  feasible;  if  it  is,  then  so  is  problem  Qr.  If  Qr  is  infeasible,  then  so  is  Q.  The 
converse  is  generally  not  true;  i.e.,  if  Qr  is  feasible,  Q  need  not  be  feasible.  In  order  to  establish 
feasibility  of  Q  in  this  case,  the  randomization  -  M.QVC  loop  should  yield  at  least  one  feasible 
solution.  This  is  most  often  the  case,  as  will  be  verified  in  the  sequel.  If  the  randomization  -  M.QVC 
loop  fails  to  return  at  least  one  feasible  solution,  then  the  (in)feasibility  of  Q  cannot  be  determined. 
There  is,  therefore,  a  relatively  small  proportion  of  problem  instances  for  which  (in)feasibility  of  Q 
cannot  be  decided  using  the  proposed  approach.  It  is  evident  from  the  above  discussion  that  feasibility 
is  a  key  aspect  of  problem  Q  and  its  proposed  solution  via  problem  Qr  and  the  randomization  - 
M.QVC  loop.  Feasibility  depends  on  a  number  of  factors;  namely,  the  number  of  transmit  antenna 
elements  N,  the  number  and  the  populations  of  the  multicast  groups,  G  and  Gf.  respectively,  the 
channel  characteristics  h, ,  the  channel  noise  variances  a'f,  and  finally  the  desired  receive  SINR 
constraints  7 

Beyond  feasibility,  there  are  two  key  issues  of  interest.  The  first  has  to  do  with  cases  for  which  the 
solution  to  the  relaxed  problem  Qr  yields  an  exact  optimum  of  the  original  problem  Q.  This  happens 
when  the  N  x  N  solution  blocks  X&,  k  G  {1,  •  •  •  ,  G},  turn  out  all  being  rank-one.  In  this  case,  the 
associated  principal  components  solve  optimally  the  original  problem  Q,  i.e.,  in  such  a  case  Qr  is 
not  a  relaxation  after  all.  It  is  interesting  to  find  the  frequency  of  occurrence  of  such  an  event,  whose 
benefit  is  twofold:  the  problem  is  solved  not  only  optimally,  but  also  at  a  smaller  complexity,  since  the 
randomization  step  and  the  repeated  solution  of  the  ensuing  M.QTC  problem  is  avoided.  The  second 
issue  of  interest  has  to  do  with  the  quality  of  the  final  approximate  solution  to  problem  Q,  in  those 
cases  where  a  feasible  solution  can  be  found  using  the  proposed  two-step  algorithm.  As  in  [11],  a 
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practical  figure  of  merit  for  the  quality  of  the  final  approximate  solution  (set  of  beamforming  vectors 

and  power  scaling  factors)  is  the  ratio  of  the  total  transmitted  power  corresponding  to  the  approximate 
_ _ 

solution  over  ^A,=  ]  trace(X/.)  -  the  lower  bound  generated  from  the  solution  of  the  relaxed  problem 

Qr- 

We  first  consider  the  standard  i.i.d.  Rayleigh  fading  model,  i.e.,  the  elements  of  the  JVxl  channel 
vectors  h, .  Vi  G  {1, . . .  ,M}  are  i.i.d.  circularly  symmetric  complex  Gaussian  random  variables 
of  variance  1.  For  each  scenario  considered,  300  different  channel  snapshots  are  randomly  created 
according  to  the  aforementioned  model  and  fed  to  the  proposed  algorithm.  The  results  presented  in  this 
subsection  are  obtained  by  averaging  over  300  Monte-Carlo  runs,  using  300  Gaussian  randomization 
samples  in  each  run.  Tables  I,  II,  and  III  summarize  these  results,  for  N  (number  of  transmit  antenna 
elements)  set  to  4,  6,  and  8  respectively.  The  proposed  algorithm  is  tested  for  a  variety  of  choices  for 
M  (the  total  number  of  single-antenna  receivers)  and  G  (the  number  of  multicast  groups),  which  index 
the  rows  in  the  tables  (columns  1  and  2,  respectively).  The  users  are  considered  to  be  evenly  distributed 
among  the  multicast  groups,  i.e.,  Gk  =  M/G ,  VA:  G  {1, . . .  ,G}.  For  each  such  configuration,  the 
QoS  downlink  beamforming  problem  is  solved  for  increasing  values  (in  the  6-20  dB  range,  see  column 
3)  of  the  received  SINR  constraints  (same  for  all  users),  provided  that  there  exist  channel  instances 
for  which  problem  Qr  is  feasible.  The  noise  variance  is  set  to  a2  =  1  for  all  channels.  The  percentage 
of  the  300  Monte-Carlo  runs  for  which  Qr  is  feasible  is  shown  in  column  4.  Columns  5  reports  the 
percentage  of  feasible  solutions  to  problem  Qr,  which  yield  exact  solutions  to  problem  Q.  This  is 
calculated  as  the  percentage  of  problem  instances  for  which  all  in  the  solution  of  Qr  turn  out 
having  rank  (essentially)  equal  to  one  (defined  by  the  second  largest  eigenvalue  being  smaller  than 
l(r  3  of  the  sum  of  all  eigenvalues).  Column  6  reports  the  percentage  of  problem  instances  for  which, 
once  a  feasible  solution  to  problem  Qr  is  found,  the  proposed  algorithm’s  second  step,  i.e.,  the  ensuing 
randomization  -  M.QVC  loop,  yields  at  least  one  feasible  solution  for  the  original  problem  Q.  The 
next  two  columns  (7  and  8),  hold  the  mean  and  standard  deviation  of  the  quality  measure,  defined 
in  Section  IV  as  the  ratio  of  transmitted  power  corresponding  to  the  final  approximate  solution  over 
the  lower  bound  obtained  from  the  SDR  solution.  This  ratio  equals  one  when  rank  relaxation  is  exact 
(not  a  relaxation  after  all),  and  the  reported  statistics  depend  on  the  frequency  (see  column  5)  of 
this  event.  In  order  to  obtain  additional  insight  in  the  quality  of  the  approximation  step,  conditional 
statistics  are  also  reported  in  the  last  two  columns  (9  and  10)  after  excluding  exact  optimum  solutions 
from  the  calculation. 

An  initial  comment,  regarding  the  feasibility  of  the  relaxed  problem  Qr,  is  that  in  all  configurations 
considered,  the  higher  the  target  SINR,  the  less  likely  it  is  that  Qr  is  feasible,  which  is  intuitive. 
Furthermore,  Qr  is  getting  more  difficult  to  solve  as  the  number  G  of  multicast  groups  increases  and/or 
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as  more  (randomly  generated)  users  per  multicast  group  are  added,  since  in  either  case  interference 
is  higher.  Finally,  it  is  seen  that  increasing  the  number  of  transmit  antennas  (N)  improves  service,  as 
expected:  higher  receive  SINR  can  be  attained  by  more  users  in  more  multicast  groups. 


The  most  interesting  observation,  concerning  the  percentage  of  problems  Qr  for  which  the  relax¬ 
ation  is  tight,  is  that  it  increases  as  the  number  of  users  per  multicast  group  decreases;  percentages 
are  significant  especially  when  the  number  of  users  per  group  is  smaller  or  equal  to  the  number  of 
transmit  antennas.  This  can  be  seen  in  two  ways:  either  by  holding  the  number  of  groups  fixed  while 
decreasing  their  populations,  or  by  fixing  the  total  number  of  users  and  distributing  them  in  more 
multicast  groups.  Trying  to  interpret  this  fact,  note  that  in  both  cases  the  problem  is  pushed  towards 
the  multiuser  (independent  information)  downlink  problem,  where  each  user  forms  a  multicast  group. 
The  latter  is  known  to  be  convex,  and  the  associated  SDP  relaxation  has  been  shown  to  be  tight  [1], 
In  addition,  the  Qr  optimality  percentage  also  increases  with  target  SINR.  It  seems  as  if  rank-one 
solutions  are  more  likely  when  operating  close  to  the  infeasibility  boundary.  In  some  scenarios,  Qr 
consistently  yields  an  exact  solution  of  Q.  That  is,  the  X/,.  blocks  are  all  consistently  rank-one.  In 
this  case,  no  further  randomization  is  needed  -  the  principal  components  of  the  extracted  blocks  are 
the  optimal  beamformers.  More  on  this  can  be  found  in  [7]. 


As  far  as  the  approximation  step  of  the  proposed  algorithm  is  concerned,  we  can  distinguish 
two  cases.  In  most  of  the  scenarios  considered,  the  number  of  users  per  multicast  group  was  kept 
smaller  or  equal  to  the  number  of  transmit  antenna  elements,  so  that  a  realistic  value  of  the  receive 
SINR  could  be  guaranteed,  for  a  significant  fraction  of  the  different  channel  instances.  There,  the 
randomization  -  M.QVC  loop  yields  a  feasible  solution  with  a  probability  higher  than  90%  in  most 
cases  where  Qr  is  feasible;  this  solution  entails  transmission  power  that  is  under  two  times  (3  dB 
from)  the  possibly  unattainable  lower  bound,  on  average.  The  actual  numbers  for  each  configuration 
depend  on  the  number  of  the  Gaussian  randomization  samples;  300  have  proved  adequate  for  most 
configurations.  However,  when  a  relatively  low  target  SINR  is  to  be  guaranteed  to  a  number  of  users 
per  group  larger  than  the  number  of  antennas,  the  feasibility  of  the  approximation  decreases  and  the 
power  penalty  increases.  This  can  be  appreciated  by  looking  at  the  lowest  sub-matrices  of  Tables  I- 
III.  Simulations  are  repeated  for  these  configurations  using  1000  Gaussian  random  samples.  The 
results  are  summarized  in  Table  IV,  where  an  extra  column  has  been  added  at  the  front,  indicating 
the  number  N  of  transmit  antenna  elements.  A  small  improvement  is  observed  in  the  quality  of  the 
approximation;  but  it  is  still  inadequate  for  the  last  configuration. 
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B.  Max-min-fair  Approach 

In  this  subsection  we  assess  the  performance  of  the  algorithm  derived  in  Section  V  for  the  JMMF 
downlink  multicast  beamforming  problem.  As  in  the  previous  subsection,  the  standard  i.i.d.  Rayleigh 
fading  model  is  used  for  Monte-Carlo  simulations.  Table  V  summarizes  the  results  obtained  using  the 
proposed  algorithm  for  300  Monte-Carlo  runs  and  1000  Gaussian  randomization  samples  each.  The 
value  of  the  available  transmit  power  P  is  set  to  1000  for  all  the  reported  simulation  results.  Note  at 
this  point  that,  contrary  to  the  single-group  multicasting  scenario  [11],  the  optimization  problem  in 
the  general  case  of  multiple  multicast  groups  is  interference-limited;  hence,  it  depends  on  the  value 
of  P.  Specifically,  if  the  same  problem  is  solved  for  two  different  values  of  P,  the  designed  beams 
will  have  the  same  shape,  but  the  power  allocation,  i.e.  the  solution  of  the  M.QVC  problem  will 
differ. 

Simulations  are  conducted  for  three  different  choices  (4,  6,  and  8)  of  the  number  N  of  transmit 
antenna  elements  and  a  variety  of  choices  for  the  number  of  receiving  single-antenna  mobile  users 
M,  shown  respectively  in  the  first  and  the  second  column  of  the  table.  The  users  are  considered 
to  be  evenly  distributed  among  the  G  multicast  groups;  their  number  is  stored  in  the  third  column. 
The  fourth  column  reports  the  percentage  of  the  Monte-Carlo  runs  for  which  all  solution  blocks  X/, 
of  the  relaxed  problem  Tr  are  essentially  rank-one.  As  mentioned  already,  when  this  is  the  case, 
the  principal  components  of  the  blocks  optimally  solve  the  original  joint  max-min-fair  problem,  i.e. 
problem  Tr  is  equivalent  to  and  not  a  relaxation  of  problem  T\  hence,  there  is  no  need  for  the 
algorithm’s  second  step  (randomization  -  M.QVC'  loop).  It  is  evident  that  this  case  occurs  quite 
frequently,  with  a  frequency  which  drops  as  the  number  of  users  and  the  number  of  multicast  groups 
increases. 

The  next  two  columns  (fifth  and  sixth)  of  Table  V  hold  the  average  value  (over  all  Monte-Carlo 
runs)  and  the  standard  deviation,  respectively,  of  the  ratio  of  the  optimal  value  of  problem  J-r  to  the 
maximum  attained  optimal  value  of  problem  M.QVC' .  This  is  a  measure  of  the  quality  of  the  overall 
solution  obtained  using  our  proposed  approach.  The  final  two  columns  (seventh  and  eighth)  report  the 
same  statistics,  but  only  for  the  Monte-Carlo  runs  for  which  the  relaxation  is  not  essentially  tight,  for 
additional  insight  on  the  quality  of  the  approximation  step.  It  is  observed  that  the  minimum  achieved 
SINR  is  usually  very  close  (in  the  mean)  to  the  upper  bound  calculated  by  the  relaxed  SDP  problem 
T\  thus,  the  approximation  step  yields  high  quality  solutions.  Compared  with  the  respective  results 
for  the  single  multicast  group  case  [11],  the  multi-group  algorithm  consistently  performs  better.  In 
addition,  the  quality  of  the  approximation  becomes  better  (i.e.,  the  mean  of  the  ratio  drops)  as  a  given 
number  of  users  is  distributed  among  a  larger  number  of  multicast  groups  (e.g.,  see  the  case  of  12 
users,  divided  into  2,  3,  and  4  groups).  The  interpretation  given  for  the  QoS  formulation,  that  the 
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problem  is  pushed  towards  the  (convex)  multiuser  downlink  problem,  applies  here,  too. 

Regarding  practical  execution  time,  the  SDP  feasibility  problem  Tr  is  solved  in  about  0.1  sec, 
on  a  typical  desktop  PC,  for  the  cases  considered.  The  variation  of  this  execution  time  is  almost 
negligible  for  the  tested  variation  of  the  values  M  and  G  (users  and  groups,  respectively).  However, 
an  approximately  linear  dependence  of  the  execution  time  on  the  number  of  transmit  antennas  N 
has  been  observed.  The  LP  feasibility  problem  is  solved  in  approximately  0.05  sec,  irrespective  of 
the  scenario  considered.  Thus,  in  practice  the  algorithm  needs  approximately  1  +  0.5A7rand  sec  (for 
TVjter  ~  N'ter  ~  10),  when  the  relaxation  is  not  tight. 

C.  Experiments  with  Measured  Channel  Data 

The  performance  of  the  proposed  multicast  beamforming  algorithms  was  also  tested  on  measured 
channel  data  courtesy  of  iCORE  HCDC  Lab,  University  of  Alberta  in  Edmonton,  Canada.  Measure¬ 
ments  were  earned  out  using  a  portable  4x4  multiple-input  multiple-output  (MIMO)  testbed  that 
operates  in  the  902-928  MHz  (ISM)  band.  The  transmitter  (Tx)  and  the  receiver  (Rx)  were  equipped 
with  antenna  arrays,  each  comprising  four  vertically  polarized  dipole  antennas  spaced  A/2  (~  16 
cm)  apart.  The  chip  rate  used  for  sounding  was  low  enough  to  safely  assume  that  the  channel  is  not 
frequency  selective.  More  details  on  the  testbed  configuration  and  the  procedure  used  to  estimate  the 
channel  gains  of  the  MIMO  channel  matrix  can  be  found  in  [5].  Datasets  and  a  detailed  description  of 
many  measurement  campaigns  in  typical  propagation  environments  are  available  at  the  iCORE  HCDC 
Lab  website  (http://www.ece.ualberta.ca/~mimo/).  The  most  pertinent  scenario  for  our  purposes  is 
the  stationary  outdoor  one,  called  Quad  and  illustrated  in  Figure  2.  Quad  is  a  150  by  60  meters 
lawn  surrounded  by  buildings  with  heights  from  approximately  15  to  30  meters.  The  Tx  location  was 
fixed,  whereas  the  Rx  was  placed  in  6  different  locations  (no  measurements  are  actually  provided  for 
location  4)  as  indicated  in  Figure  2.  For  every  Rx  location,  9  different  measurements  were  taken  by 
shifting  the  Rx  antenna  array  on  a  3  x  3  square  grid  with  A/4  spacing.  Each  measurement  contains 
about  100  4  x  4  channel  snapshots,  recorded  3  per  second;  thus  for  each  location  there  are  about 
900  MIMO  channel  gain  matrices  available.  We  form  multicast  groups  by  considering  each  receive 
antenna  at  each  location  as  a  separate  terminal,  and  grouping  terminals  in  1-3  locations  into  one 
multicast  group.  The  results  reported  in  Tables  VI-XII  and  XIII,  for  the  QoS  and  the  JMMF  problem 
formulations,  respectively,  are  obtained  by  averaging  over  the  900  channel  instances.  Channel  gains 
are  normalized  before  use,  dividing  by  the  average  channel  amplitude  for  the  respective  configuration. 
300  Gaussian  samples  are  employed  in  the  randomization  /  MGPC  loop.  The  main  findings  regarding 
performance  of  our  algorithms  applied  to  the  measured  channel  data  can  be  summarized  as  follows: 
•  For  2  multicast  groups  and  number  of  users  per  group  equal  to  the  number  of  Tx  antennas  (N  =  4), 
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the  relaxation  Q  — >  Qr  is  tight  very  frequently  (70-100%)  and  the  power  penalty  paid  by  the 
approximation  step  very  small.  These  hold  irrespective  of  the  distribution  of  each  group’s  users  in  1, 
2,  or  even  3  locations  (see  Tables  VI,  VII,  and  VIII,  respectively). 

•  For  2  multicast  groups  of  6  (or  8)  users  each,  evenly  distributed  in  2  locations,  the  relaxation  Q  — > 
Qr  is  tight  for  more  than  half  of  the  occasions  (see  Tables  IX,  and  XI).  There  exist  channel  instances 
for  which  SINR  up  to  14  (or  12)  dB  can  be  guaranteed;  such  high  SINR  values  are  unattainable 
under  the  corresponding  i.i.d.  Rayleigh  fading  scenario.  The  quality  of  approximation  is  good,  even 
though  the  number  of  user  per  group  is  larger  than  the  number  of  transmit  antenna  elements. 

•  When  6  users  in  each  of  2  multicast  groups  are  evenly  distributed  in  3  locations,  the  relaxation 
Q  —■ >  Qr  is  tight  less  frequently  (<  80%),  and  the  problem  is  feasible  only  up  to  about  10  dB  (see 
Table  X).  The  feasibility  of  the  approximation  step  can  drop  <  80%. 

•  For  3  multicast  groups  (see  Table  XII)  of  3  co-located  users  each,  the  relaxation  Q  — >  Qr  is  almost 
always  tight  (>  90%)  and  feasible  up  to  10  dB  of  prescribed  SINR.  For  4  users  per  group  it  becomes 
infeasible  for  SINR  values  larger  than  about  8  dB. 

•  When  the  number  of  users  per  multicast  group  is  small,  the  relaxation  T  — >  Tr  in  the  JMMF 
formulation  is  tight  in  a  high  percentage  of  cases  (see  Table  XIII).  This  percentage  drops  as  the 
number  of  users  per  multicast  group  increases.  In  all  scenarios  considered,  the  proposed  algorithm 
yields  high  quality  approximate  solutions. 

VII.  Conclusions 

The  downlink  beamforming  problem  was  considered  for  the  general  case  of  multiple  co-channel 
multicast  groups,  under  two  design  criteria:  QoS,  in  which  we  seek  to  minimize  the  total  transmitted 
power  while  guaranteing  a  prescribed  minimum  SINR  at  all  receivers;  and  a  fair  objective,  in 
which  we  seek  to  maximize  the  minimum  received  SINR  under  a  total  power  constraint.  Both 
formulations  contain  single  group  multicast  beamforming  as  a  special  case,  and  are  therefore  NP-hard. 
Computationally  efficient  quasi-optimal  solutions  were  proposed  by  means  of  SDR  and  a  combined 
randomization  -  MGPC  loop.  Extensive  numerical  results  have  been  presented,  using  both  simulated 
(i.i.d.  Rayleigh)  and  measured  stationary  outdoor  wireless  channel  data,  showing  that  the  proposed 
algorithms  yield  high  quality  approximate  solutions  at  a  moderate  complexity  cost.  Interestingly,  our 
numerical  findings  indicate  that  the  solutions  generated  by  our  algorithms  are  often  exactly  optimal, 
especially  in  the  case  of  measured  channels.  In  certain  cases  this  optimality  can  be  proven  beforehand, 
and  alternative  convex  reformulations  of  lower  complexity  can  be  constructed;  see  [7]  for  further 
details.  In  other  cases,  a  theoretical  worst-case  bound  on  approximation  accuracy  can  be  derived,  and 
shown  to  be  tight;  on  this  issue,  see  [8]. 
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Fig.  1.  Co-channel  multicast  beamforming  concept  (note  that  groups  need  not  be  spatially  clustered). 
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Fig.  2.  Sample  wireless  channel  measurement  scenario  from  http://www.ece.ualberta.ca/~mimo/ 


TABLE  I 

MC  SIMULATION  RESULTS  (RAYLEIGH);  QoS  TX  BEAMFORMING;  N  =  4  TX  ANTENNAS,  300  RANDOMIZATIONS 


feas. 

opt. 

feas. 

all  solutions 

appr.  solutions 

M 

G 

SINR 

Qr 

Qr 

appr. 

mean 

std 

mean 

std 

6 

3 

6 

89.67 

99.63 

100 

1.0000 

0.0005 

1.0079 

0 

6 

3 

8 

70.33 

100 

- 

1 

0 

- 

- 

6 

3 

10 

45.33 

100 

- 

1 

0 

- 

- 

6 

3 

12 

27 

100 

- 

1 

0 

- 

- 

6 

3 

14 

14 

100 

- 

1 

0 

- 

- 

6 

3 

16 

7 

100 

- 

1 

0 

- 

- 

8 

2 

6 

98.33 

79.66 

98.31 

1.0550 

0.1710 

1.2902 

0.2950 

8 

2 

8 

90.67 

83.46 

98.90 

1.0838 

0.3788 

1.5366 

0.8301 

8 

2 

10 

73.33 

83.18 

98.18 

1.1935 

1.8118 

2.2668 

4.5446 

8 

2 

12 

52 

85.90 

98.72 

1.2018 

2.1247 

2.5542 

5.8430 

8 

2 

14 

32 

88.54 

100 

1.0128 

0.0593 

1.1113 

0.1462 

8 

2 

16 

16.33 

89.80 

95.92 

1.0426 

0.1892 

1.6679 

0.4433 

8 

2 

18 

9.33 

92.86 

100 

1.0154 

0.0669 

1.2162 

0.1847 

8 

2 

20 

3 

88.89 

100 

1.0543 

0.1628 

1.4884 

0 

9 

3 

6 

5.67 

100 

- 

1 

0 

- 

- 

12 

2 

6 

42 

49.21 

79.37 

1.6927 

1.8918 

2.8228 

2.7314 

12 

2 

8 

10.33 

80.65 

93.55 

1.1921 

0.5123 

2.3929 

0.4689 

12 

2 

10 

1.33 

100 

- 

1 

0 

- 

- 

16 

2 

6 

6.33 

26.32 

68.42 

1.5619 

1.2120 

1.9131 

1.4669 
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TABLE  II 

MC  SIMULATION  RESULTS  (RAYLEIGH);  QoS  TX  BEAMFORMING;  N  =  6  TX  ANTENNAS,  300  RANDOMIZATIONS 


feas. 

opt. 

feas. 

all  solutions 

appr.  solutions 

M 

G 

SINR 

Qr 

Qr 

appr. 

mean 

std 

mean 

std 

8 

2 

6 

100 

80.67 

99 

1.0228 

0.0734 

1.1233 

0.1301 

8 

2 

8 

100 

82.33 

98.67 

1.0162 

0.0514 

1.0979 

0.0900 

8 

2 

10 

100 

87.67 

97.67 

1.0118 

0.0485 

1.1150 

0.1067 

8 

2 

12 

100 

88 

98 

1.0102 

0.0396 

1.1004 

0.0803 

8 

2 

14 

100 

89.33 

98.67 

1.0099 

0.0478 

1.1050 

0.1211 

8 

2 

16 

100 

90.33 

98 

1.0089 

0.0483 

1.1143 

0.1359 

8 

2 

18 

100 

92.67 

99.33 

1.0071 

0.0409 

1.1052 

0.1235 

8 

2 

20 

100 

92 

99.33 

1.0064 

0.0328 

1.0862 

0.0894 

12 

2 

6 

100 

35.33 

94 

1.2782 

0.5915 

1.4458 

0.6977 

12 

2 

8 

100 

39 

95 

1.505 

3.3075 

1.8661 

4.2771 

12 

2 

10 

97 

50.52 

92.44 

1.2513 

0.5905 

1.5542 

0.7766 

12 

2 

12 

86.67 

56.92 

94.23 

1.2172 

0.5583 

1.5487 

0.7800 

12 

2 

14 

68.67 

63.59 

94.66 

1.2330 

0.8614 

1.7098 

1.3993 

12 

2 

16 

47 

69.50 

92.91 

1.2031 

1.3485 

1.8064 

2.6241 

12 

2 

18 

27.33 

69.51 

95.12 

1.1972 

0.9655 

1.7324 

1.7825 

12 

2 

20 

17 

82.35 

100 

1.0734 

0.2537 

1.4157 

0.4921 

12 

3 

6 

72 

76.85 

93.06 

1.2440 

1.1504 

2.4010 

2.4730 

12 

3 

8 

19.67 

83.05 

94.92 

1.0433 

0.2193 

1.3460 

0.5645 

12 

3 

10 

2.33 

100 

- 

1 

0 

- 

- 

12 

4 

6 

4 

100 

- 

1 

0 

- 

- 

16 

2 

6 

98.33 

11.19 

74.58 

3.1376 

4.9208 

3.5149 

5.2495 

16 

2 

8 

75 

15.11 

63.56 

2.4204 

2.2800 

2.8635 

2.4499 

16 

2 

10 

26.67 

31.25 

58.75 

1.5876 

1.0146 

2.2554 

1.1734 

16 

2 

12 

4.33 

38.46 

84.62 

4.5305 

7.4475 

7.4725 

9.3851 
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TABLE  III 

MC  SIMULATION  RESULTS  (RAYLEIGH);  QoS  TX  BEAMFORMING;  N  =  8  TX  ANTENNAS,  300  RANDOMIZATIONS 


feas. 

opt. 

feas. 

all  solutions 

appr.  solutions 

M 

G 

SINR 

Qr 

Qr 

appr. 

mean 

std 

mean 

std 

12 

2 

6 

100 

37 

95.33 

1.1814 

0.2527 

1.2964 

0.2651 

12 

2 

8 

100 

35.67 

96.33 

1.1733 

0.2409 

1.2752 

0.2532 

12 

2 

10 

100 

34.67 

95 

1.1734 

0.2329 

1.2730 

0.2413 

12 

2 

12 

100 

41.33 

96 

1.1485 

0.2099 

1.2607 

0.2194 

12 

2 

14 

100 

43 

95 

1.1478 

0.2157 

1.2700 

0.2281 

12 

2 

16 

100 

45 

94.33 

1.1316 

0.1956 

1.2516 

0.2074 

12 

2 

18 

100 

48.33 

95.67 

1.1226 

0.2297 

1.2477 

0.2754 

12 

2 

20 

100 

53.33 

95.33 

1.0993 

0.1765 

1.2253 

0.2059 

12 

3 

6 

100 

78.67 

98.33 

1.0372 

0.1065 

1.1862 

0.1711 

12 

3 

8 

100 

79 

98 

1.0367 

0.1081 

1.1892 

0.1783 

12 

3 

10 

98.67 

81.42 

98.99 

1.0452 

0.1406 

1.2545 

0.2425 

12 

3 

12 

94.67 

85.21 

97.54 

1.0393 

0.1464 

1.3112 

0.2945 

12 

3 

14 

78.67 

88.14 

98.73 

1.0559 

0.2863 

1.5207 

0.7351 

12 

3 

16 

52.33 

92.99 

99.36 

1.0241 

0.1114 

1.3766 

0.2571 

12 

3 

18 

30.67 

93.48 

98.91 

1.0291 

0.1444 

1.5303 

0.3705 

12 

3 

20 

17.67 

98.11 

100 

1.0054 

0.0396 

1.2881 

0 

12 

4 

6 

100 

93.33 

99.67 

1.0072 

0.0342 

1.1130 

0.0822 

12 

4 

8 

87.33 

98.09 

99.62 

1.0037 

0.0353 

1.2439 

0.1731 

12 

4 

10 

42.33 

97.64 

100 

1.0075 

0.0635 

1.3166 

0.3272 

12 

4 

12 

12 

97.22 

100 

1.0099 

0.0595 

1.3568 

0 

12 

4 

14 

3.33 

100 

- 

1 

0 

- 

- 

16 

2 

6 

100 

9.67 

93 

1.8833 

1.6259 

1.9858 

1.6882 

16 

2 

8 

100 

11.67 

91 

1.9955 

2.2743 

2.1419 

2.4018 

16 

2 

10 

99.67 

15.05 

86.62 

1.8757 

1.3169 

2.0598 

1.3800 

16 

2 

12 

98.67 

22.64 

88.18 

1.6957 

1.5693 

1.9359 

1.7582 

16 

2 

14 

94.67 

31.69 

88.38 

1.7937 

2.2986 

2.2374 

2.7755 

16 

2 

16 

72.67 

46.33 

92.20 

1.7075 

3.8635 

2.4220 

5.3971 

16 

2 

18 

54 

59.26 

93.21 

1.3305 

1.0350 

1.9073 

1.5628 

16 

2 

20 

33.33 

65 

94 

1.2662 

0.8232 

1.8630 

1.3105 

24 

2 

6 

99.33 

0.34 

43.96 

6.7920 

8.7394 

6.8366 

8.7582 

24 

2 

8 

60.67 

4.40 

30.22 

4.8706 

6.2300 

5.5294 

6.5203 

24 

2 

10 

11.67 

14.29 

34.29 

3.6415 

5.1995 

5.5284 

6.2925 
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TABLE  IV 

MC  SIMULATION  RESULTS  (RAYLEIGH);  QoS  TX  BEAMFORMING;  1000  RANDOMIZATIONS 


feas. 

opt. 

feas. 

all  solutions 

appr.  solutions 

N 

M 

G 

SINR 

Qr 

Qr 

appr. 

mean 

std 

mean 

std 

4 

12 

2 

6 

42 

49.21 

85.71 

1.7778 

3.0955 

2.8261 

4.5637 

4 

12 

2 

8 

10.33 

80.65 

96.77 

1.1785 

0.4303 

2.0710 

0.3838 

4 

16 

2 

6 

6.33 

26.32 

73.68 

1.5490 

1.1660 

1.8540 

1.3843 

6 

16 

2 

6 

98.33 

11.19 

85.08 

3.5662 

9.0980 

3.9547 

9.7061 

6 

16 

2 

8 

75 

15.11 

68.89 

2.6867 

3.6744 

3.1607 

4.0366 

6 

16 

2 

10 

26.67 

31.25 

65 

1.6679 

1.5570 

2.2863 

1.9823 

6 

16 

2 

12 

4.33 

38.46 

92.31 

2.9157 

4.1172 

4.2840 

5.0828 

8 

16 

2 

6 

100 

9.67 

96.67 

1.8918 

3.8850 

1.9909 

4.0839 

8 

16 

2 

8 

100 

11.67 

95.33 

1.8077 

1.5349 

1.9203 

1.6067 

8 

16 

2 

10 

99.67 

15.05 

93.31 

1.7600 

1.9039 

1.9061 

2.0475 

8 

16 

2 

12 

98.67 

22.64 

92.23 

1.6557 

1.7557 

1.8689 

1.9758 

8 

16 

2 

14 

94.67 

31.69 

94.01 

1.6887 

2.7756 

2.0389 

3.3582 

8 

16 

2 

16 

72.67 

46.33 

95.87 

1.3992 

1.0721 

1.7725 

1.3939 

8 

16 

2 

18 

54 

59.26 

96.30 

1.3021 

0.8748 

1.7856 

1.2744 

8 

16 

2 

20 

33.33 

65 

94 

1.2416 

0.7279 

1.7832 

1.1492 

8 

24 

2 

6 

99.33 

0.34 

52.01 

5.8627 

7.2044 

5.8942 

7.2142 

8 

24 

2 

8 

60.67 

4.40 

35.71 

5.7322 

10.6103 

6.3963 

11.1810 

8 

24 

2 

10 

11.67 

14.29 

37.14 

2.6311 

3.7697 

3.6505 

4.6122 
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TABLE  V 

MC  SIMULATION  RESULTS  (RAYLEIGH);  JMMF  TX  BEAMFORMING;  P  =  1000,  1000  RANDOMIZATIONS 


opt. 

all  solutions 

appr.  solutions 

N 

M 

G 

J~r 

mean 

std 

mean 

std 

4 

8 

2 

75.33 

1.011 

0.047 

1.043 

0.087 

4 

12 

2 

28.00 

1.086 

0.121 

1.119 

0.129 

4 

16 

2 

5.67 

1.214 

0.192 

1.227 

0.190 

4 

24 

2 

0 

1.528 

0.311 

1.528 

0.311 

4 

12 

3 

11.67 

1.053 

0.074 

1.060 

0.076 

4 

18 

3 

0 

1.200 

0.145 

1.200 

0.145 

4 

12 

4 

16.00 

1.022 

0.037 

1.026 

0.039 

4 

16 

4 

3.00 

1.082 

0.081 

1.084 

0.081 

6 

12 

2 

61.33 

1.046 

0.104 

1.119 

0.139 

6 

16 

2 

17.33 

1.146 

0.162 

1.176 

0.163 

6 

24 

2 

0.67 

1.557 

0.324 

1.561 

0.322 

6 

12 

3 

42.67 

1.022 

0.051 

1.039 

0.063 

6 

18 

3 

5.00 

1.178 

0.150 

1.187 

0.149 

6 

12 

4 

36.33 

1.009 

0.026 

1.013 

0.032 

6 

16 

4 

9.00 

1.053 

0.071 

1.058 

0.072 

8 

12 

2 

29.00 

1.066 

0.116 

1.093 

0.128 

8 

16 

2 

50.67 

1.066 

0.133 

1.133 

0.164 

8 

24 

2 

2.33 

1.490 

0.332 

1.502 

0.327 

8 

32 

2 

0 

1.996 

0.447 

1.996 

0.447 

8 

12 

3 

78.33 

1.007 

0.026 

1.033 

0.048 

8 

18 

3 

17.67 

1.098 

0.126 

1.118 

0.130 

8 

12 

4 

66.00 

1.001 

0.011 

1.005 

0.019 

8 

16 

4 

25.33 

1.025 

0.054 

1.037 

0.060 
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TABLE  VI 

2  MULTICAST  GROUPS;  4  USERS  PER  GROUP  IN  1  LOCATION 


feas. 

opt. 

feas. 

all  solutions 

appr.  solutions 

SINR 

Qr 

Qr 

appr. 

mean 

std 

mean 

std 

Group  1  (4  at  LI)  &  Group  2  (4  at  L2) 


6-18 

100 

100 

- 

1 

0 

- 

- 

20 

99.89 

100 

- 

1 

0 

- 

- 

22 

97.37 

100 

- 

1 

0 

- 

- 

24 

84.82 

100 

- 

1 

0 

- 

- 

26 

65.53 

100 

- 

1 

0 

- 

- 

28 

45.21 

100 

- 

1 

0 

- 

- 

30 

24.43 

100 

- 

1 

0 

- 

- 

Group  1  (4  at  LI)  &  Group  2  (4  at  L3) 


6 

100 

99.89 

100 

1.0001 

0.0035 

1.1026 

0 

8 

100 

99.89 

100 

1.0000 

0.0014 

1.0421 

0 

10 

100 

99.77 

100 

1.0001 

0.0014 

1.0247 

0.0216 

12 

100 

99.89 

100 

1.0001 

0.0018 

1.0536 

0 

14 

100 

99.89 

100 

1.0001 

0.0030 

1.0876 

0 

16 

100 

99.77 

100 

1.0002 

0.0047 

1.0805 

0.0777 

18 

93.16 

97.92 

99.88 

1.0068 

0.0761 

1.3493 

0.4321 

20 

82.10 

98.61 

100 

1.0024 

0.0246 

1.1715 

0.1262 

22 

72.98 

99.69 

100 

1.0040 

0.0934 

2.2920 

1.4971 

24 

60.78 

99.44 

100 

1.0102 

0.2047 

2.8121 

2.4993 

26 

35.01 

99.02 

100 

1.0006 

0.0086 

1.0623 

0.0752 

28 

18.02 

100 

- 

1 

0 

- 

- 

30 

9.46 

100 

- 

1 

0 

- 

- 

Group  1  (4  at  L5)  &  Group  2  (4  at  L2) 


6 

100 

98.63 

100 

1.0009 

0.0091 

1.0636 

0.0470 

8 

100 

98.63 

99.89 

1.0011 

0.0115 

1.0884 

0.0549 

10 

99.77 

96.22 

99.66 

1.0083 

0.0691 

1.2398 

0.2927 

12 

96.80 

93.28 

98.94 

1.0217 

0.1365 

1.3789 

0.4407 

14 

85.27 

92.91 

98.66 

1.0364 

0.4837 

1.6236 

1.9302 

16 

64.61 

96.64 

99.65 

1.0105 

0.0824 

1.3491 

0.3363 

18 

40.87 

97.77 

99.44 

1.0029 

0.0249 

1.1709 

0.0972 

20 

23.74 

99.52 

100 

1.0002 

0.0030 

1.0433 

0 

22 

10.05 

97.73 

100 

1.0017 

0.0148 

1.0731 

0.0930 

24 

4.22 

100 

- 

1 

0 

- 

- 

Group  1  (4  at  L5)  &  Group  2  (4  at  L7) 


6 

97.72 

82.15 

97.78 

1.0475 

0.1900 

1.2968 

0.3908 

8 

91.11 

83.48 

87.87 

1.0486 

0.2323 

1.3306 

0.5251 

10 

81.64 

87.29 

98.05 

1.0436 

0.2147 

1.3978 

0.5314 

12 

49.49 

91.94 

98.85 

1.0427 

0.2919 

1.6110 

0.9480 

14 

18.93 

90.97 

98.19 

1.0172 

0.0782 

1.2339 

0.1868 

16 

8.78 

93.51 

100 

1.0101 

0.0637 

1.1551 

0.2212 

18 

2.85 

92 

96 

1.0003 

0.0017 

1.0081 

0 
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TABLE  VII 

2  MULTICAST  GROUPS;  4  USERS  PER  GROUP  IN  2  LOCATIONS 


feas. 

opt. 

feas. 

all  solutions 

appr.  solutions 

SINR 

Qr 

Qr 

appr. 

mean 

std 

mean 

std 

Group  1  (2  at  L2  &  2  at  L3)  &  Group  2  (2  at  LI  &  2  at  L6) 


6 

100 

95.09 

99.54 

1.0112 

0.0716 

1.2507 

0.2367 

8 

100 

92.01 

99.77 

1.0264 

0.1277 

1.3388 

0.3242 

10 

99.66 

94.96 

99.66 

1.0059 

0.0421 

1.1258 

0.1519 

12 

95.21 

96.28 

99.40 

1.0121 

0.1395 

1.3863 

0.7031 

14 

82.08 

97.22 

99.72 

1.0107 

0.0989 

1.4243 

0.4757 

16 

64.16 

97.86 

99.82 

1.0092 

0.0937 

1.4705 

0.5024 

18 

43.84 

98.44 

100 

1.0040 

0.0395 

1.2534 

0.2096 

20 

24.54 

99.07 

100 

1.0019 

0.0194 

1.2014 

0.0238 

Group  1  (2  at  LI  &  2  at  L3)  &  Group  2  (2  at  L2  &  2  at  L6) 


6 

99.89 

95.20 

99.89 

1.0207 

0.2014 

1.4405 

0.8339 

8 

99.32 

91.72 

99.20 

1.0313 

0.1918 

1.4162 

0.5770 

10 

95.89 

94.64 

99.52 

1.0287 

0.3514 

1.5852 

1.4980 

12 

83.79 

95.37 

99.73 

1.0168 

0.1532 

1.3848 

0.6380 

14 

60.16 

98.29 

100 

1.0059 

0.0700 

1.3431 

0.4384 

16 

32.08 

99.64 

100 

1.0030 

0.0511 

1.8562 

0 

18 

13.01 

100 

- 

1 

0 

- 

- 

20 

4.00 

100 

- 

1 

0 

- 

- 

Group  1  (2  at  L2  &  2  at  L6)  &  Group  2  (2  at  L5  &  2  at  L7) 


6 

100 

81.39 

99.43 

1.0518 

0.1817 

1.2854 

0.3404 

8 

99.66 

81.21 

98.51 

1.0529 

0.1808 

1.3014 

0.3345 

10 

96.12 

85.87 

98.81 

1.0413 

0.1690 

1.3150 

0.3642 

12 

82.76 

90.21 

98.90 

1.0374 

0.2634 

1.4262 

0.7954 

14 

57.42 

93.24 

98.82 

1.1537 

2.8925 

3.7284 

12.1003 

16 

30.48 

92.51 

98.88 

1.0230 

0.1203 

1.3573 

0.3329 

18 

13.47 

94.92 

99.15 

1.0132 

0.0740 

1.3091 

0.2106 

20 

5.94 

92.31 

100 

1.0342 

0.1755 

1.4442 

0.5297 
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TABLE  VIII 

2  MULTICAST  GROUPS;  4  USERS  PER  GROUP  IN  3  LOCATIONS 


feas. 

opt. 

feas. 

all  solutions 

appr.  solutions 

SINR 

Qr 

Qr 

appr. 

mean 

std 

mean 

std 

Group  1  (1  at  LI,  1  at  L2  &  2  at  L3)  &  Group  2  (1  at  L5,  2  at  L6  &  1  at  L7) 


6 

99.89 

86.40 

99.31 

1.0320 

0.2806 

1.2459 

0.7464 

8 

98.17 

88.26 

98.72 

1.0319 

0.2202 

1.3006 

0.6166 

10 

93.61 

88.29 

98.78 

1.0307 

0.1664 

1.2890 

0.4335 

12 

72.95 

92.02 

99.06 

1.0276 

0.1933 

1.3888 

0.6271 

14 

47.49 

94.95 

99.04 

1.0116 

0.0736 

1.2800 

0.2436 

16 

24.43 

97.67 

100 

1.0333 

0.4264 

2.4273 

2.6821 

18 

12.10 

98.11 

100 

1.0017 

0.0147 

1.0897 

0.0824 

Group  1  (1  at  LI,  1  at  L3  &  2  at  L6)  &  Group  2  (1  at  L2,  1  at  L5  &  2  at  L7) 


6 

100 

72.37 

98.06 

1.1180 

0.5397 

1.4503 

0.9824 

8 

99.43 

74.97 

97.70 

1.0897 

0.3115 

1.3856 

0.5513 

10 

93.38 

80.32 

97.31 

1.1802 

2.7113 

2.0320 

6.4391 

12 

72.60 

87.11 

97.33 

1.0465 

0.2447 

1.4429 

0.6323 

14 

44.06 

88.60 

97.93 

1.0741 

0.7053 

1.7781 

2.1897 

16 

22.60 

92.93 

98.48 

1.0292 

0.1804 

1.5183 

0.5936 

TABLE  IX 

2  MULTICAST  GROUPS;  6  USERS  PER  GROUP  IN  2  LOCATIONS 


feas. 

opt. 

feas. 

all  solutions 

appr.  solutions 

SINR 

Qr 

Qr 

appr. 

mean 

std 

mean 

std 

Group  1  (3  at  L2  &  3  at  L3)  &  Group  2  (3  at  LI  &  3  at  L6) 


6 

100 

87.10 

98.74 

1.0465 

0.2543 

1.3943 

0.6438 

8 

99.77 

82.95 

97.71 

1.0690 

0.6592 

1.4569 

1.6483 

10 

84.13 

83.45 

95.79 

1.1520 

1.4584 

2.1791 

3.9289 

12 

32.53 

90.18 

97.19 

1.1019 

0.9403 

2.4109 

3.3016 

14 

8.90 

92.31 

97.44 

1.0269 

0.1923 

1.5115 

0.7708 

Group  1  (3  at  LI  &  3  at  L3)  &  Group  2  (3  at  L2  &  3  at  L6) 


6 

100 

73.17 

97.72 

1.1900 

1.4948 

1.7566 

2.9149 

8 

90.41 

68.06 

94.44 

1.3882 

2.4513 

2.3839 

4.4924 

10 

60.84 

65.85 

92.31 

1.3287 

1.0586 

2.1469 

1.7277 

12 

17.69 

72.26 

91.61 

1.3294 

1.1546 

2.5589 

2.1211 

Group  1  (3  at  L2  &  3  at  L3)  &  Group  2  (3  at  L5  &  3  at  L7) 


6 

79.57 

50.22 

85.80 

1.7201 

3.1468 

2.7365 

4.7076 

8 

36.99 

60.19 

85.80 

1.9784 

6.2140 

4.2771 

11.0821 

10 

11.87 

67.31 

90.38 

1.6219 

1.7062 

3.4356 

2.6761 
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TABLE  X 

2  MULTICAST  GROUPS;  6  USERS  PER  GROUP  IN  3  LOCATIONS 


feas. 

opt. 

feas. 

all  solutions 

appr. 

solutions 

SINR 

Qr 

Qr 

appr. 

mean 

std 

mean 

std 

Group  1  (2  at  LI,  2  at  L2  &  2  at  L3)  &  Group  2  (2  at  L5,  2  at  L6  &  2  at  L7) 


6 

92.69 

59.73 

91.38 

1.4015 

1.2621 

2.1591 

1.9312 

8 

61.99 

67.04 

88.40 

1.3053 

1.2901 

2.2634 

2.3898 

10 

13.47 

83.05 

94.92 

1.3440 

2.6684 

3.7517 

7.3235 

Group  1  (2  at  LI,  2  at  L3  &  2  at  L6)  &  Group  2  (2  at  L2,  2  at  L5  &  2  at  L7) 


6 

94.86 

46.93 

87.97 

1.9805 

4.1862 

3.1019 

5.9382 

8 

44.41 

32.96 

75.32 

1.8251 

2.9113 

3.7787 

4.8256 

10 

7.19 

82.54 

96.83 

1.1688 

0.8581 

2.1441 

2.0659 

Group  1  (2  at  LI,  2  at  L2  &  2  at  L6)  &  Group  2  (2  at  L3,  2  at  L5  &  2  at  L7) 


6 

70.21 

24.07 

81.63 

2.2538 

3.5065 

2.7779 

4.0640 

8 

32.53 

41.40 

81.40 

2.0232 

3.5807 

3.0824 

4.8974 

10 

7.42 

47.6923 

64.6154 

1.1750 

0.4518 

1.6682 

0.6888 

Group  1  (2  at  LI,  2  at  L3  &  2  at  L5)  &  Group  2  (2  at  L2,  2  at  L6  &  2  at  L7) 


6 

83.33 

58.36 

89.18 

1.6009 

2.7181 

2.7385 

4.4104 

8 

43.38 

71.05 

89.21 

1.1648 

0.6820 

1.8097 

1.3349 

10 

14.61 

80.47 

93.75 

1.0962 

0.3556 

1.6790 

0.7211 

Group  1  (2  at  L2,  2  at  L3  &  2  at  L5)  &  Group  2  (2  at  LI,  2  at  L6  &  2  at  L7) 


6 

97.26 

47.30 

92.49 

1.4560 

1.4843 

1.9334 

2.0170 

8 

68.95 

63.91 

88.91 

1.2968 

0.9642 

2.0553 

1.5861 

10 

10.89 

86.34 

93.44 

1.0478 

0.2604 

1.6284 

0.7519 
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TABLE  XI 

2  MULTICAST  GROUPS;  8  USERS  PER  GROUP  IN  2  LOCATIONS 


feas. 

opt. 

feas. 

all  solutions 

appr.  solutions 

SINR 

Qr 

Qr 

appr. 

mean 

std 

mean 

std 

Group  1  (4  at  L2  &  4  at  L3)  &  Group  2  (4  at  LI  &  4  at  L6) 


6 

100 

80.37 

97.72 

1.0984 

0.4743 

1.5540 

1.0098 

8 

96.23 

83.87 

97.03 

1.0713 

0.3764 

1.5256 

0.9007 

10 

48.86 

82.71 

96.50 

1.1279 

0.7362 

1.8951 

1.7752 

12 

5.82 

72.55 

92.16 

1.5911 

3.1056 

3.7783 

6.5226 

Group  1  (4  at  LI  &  4  at  L3)  &  Group  2  (4  at  L2  &  4  at  L6) 


6 

85.96 

56.57 

89.51 

1.7763 

3.1965 

3.1098 

5.0015 

8 

45.89 

51.74 

83.83 

2.2882 

6.8722 

4.3653 

10.8132 

10 

15.18 

73.68 

92.48 

1.3903 

2.1492 

2.9202 

4.5190 

Group  1  (4  at  L2  &  4  at  L3)  &  Group  2  (4  at  L5  &  4  at  L7) 


6 

50.46 

39.14 

66.74 

1.8379 

2.1140 

3.0261 

2.9037 

8 

2.86 

64 

68 

1.2478 

1.0216 

5.21233 

0 

TABLE  XII 

3  MULTICAST  GROUPS;  3-4  USERS  PER  GROUP  IN  1  LOCATION 


SINR 

feas. 

Qr 

opt. 

Qr 

feas. 

appr. 

all  solutions 

appr.  solutions 

mean 

std 

mean 

std 

Group  1  (3  at  LI),  Group  2  (3  at  L2)  &  Group  3  (3  at  L3) 

6 

72.15 

97.94 

99.84 

1.0069 

0.1004 

1.3638 

0.6604 

8 

36.87 

99.38 

100 

1.0006 

0.0085 

1.0961 

0.0712 

10 

13.93 

100 

- 

1 

0 

- 

- 

Group  1  (4  at  LI),  Group  2  (4  at  L2)  &  Group  3  (4  at  L3) 

6 

29.11 

94.90 

99.22 

1.0155 

0.1085 

1.3556 

0.4043 

8 

7.65 

100 

- 

1 

0 

- 

- 

Group  1  (4  at  L3),  Group  2  (4  at  L6)  &  Group  3  (4  at  L7) 

6 

9.46 

96.39 

100 

1.0174 

0.1191 

1.4820 

0.4954 
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TABLE  XIII 

Measured  channels;  JMMF  Tx  beamforming;  P  =  1000,  300  randomizations 


opt. 

all  solutions 

appr.  solutions 

Group  1 

Group  2 

Tr 

mean 

std 

mean 

std 

4@L1 

4@L3 

75.26 

1.0009 

0.0096 

1.0035 

0.0191 

2@L1,  2@L3 

2@L2,  2@L6 

87.56 

1.0040 

0.0230 

1.0321 

0.0580 

2@L5,  2@L7 

2@L2,  2@L6 

82.42 

1.0065 

0.0308 

1.0372 

0.0655 

1@L1,  1@L2,  2@L3 

1@L5,  2@L6,  1@L7 

77.72 

1.0179 

0.0570 

1.0653 

0.0937 

3@L2,  3@L3 

3@L5,  3@L7 

20.89 

1.0771 

0.1126 

1.0974 

0.1185 

3@L2,  3@L3 

3@L1,  3@L6 

45.89 

1.0154 

0.0516 

1.0285 

0.0674 

2@L1,  2@L2,  2@L3 

2@L5,  2@L6,  2@L7 

37.56 

1.0449 

0.0915 

1.0720 

0.1071 

2@L1,  2@L2,  2@L6 

2@L3,  2@L5,  2@L7 

27.85 

1.0957 

0.1315 

1.1327 

0.1381 

4@L2,  4@L3 

4@L1,  2@L6 

38.13 

1.0157 

0.0437 

1.0254 

0.0533 

4@L2,  4@L3 

4@L5,  2@L5 

9.13 

1.1084 

0.1380 

1.1193 

0.1402 
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Fast  and  Effective  Hybrid  Multidimensional  Scaling  Approach 
for  Node  Localization  in  Wireless  Sensor  Networks 

Georgios  Latsoudas,  Nicholas  D.  Sidiropoulos,  Senior  Member,  IEEE 


Abstract 

Given  a  set  of  pairwise  distance  estimates  between  nodes,  it  is  often  of  interest  to  generate  a 
map  of  node  locations.  This  is  an  old  nonlinear  estimation  problem  that  has  recently  drawn  interest 
in  the  signal  processing  community,  due  to  the  emergence  of  wireless  sensor  networks.  Sensor  maps 
are  useful  for  estimating  the  spatial  distribution  of  measured  phenomena,  and  for  routing  purposes. 

We  propose  a  two-stage  algorithm  that  combines  algebraic  initialization  and  gradient  descent.  In 
particular,  we  borrow  an  algebraic  solution  known  as  Fastmap  from  the  database  literature  and  adapt 
it  to  the  sensor  network  context,  using  a  specific  choice  of  anchor/pivot  nodes.  The  resulting  estimates 
are  fed  to  a  gradient  descent  iteration.  The  overall  algorithm  offers  very  competitive  performance 
at  significantly  lower  complexity  than  existing  solutions  with  similar  estimation  performance.  For  a 
certain  multiplicative  measurement  noise  model  that  is  often  adopted  in  the  literature,  we  also  derive 
the  pertinent  Cramer-Rao  bound  (CRB).  Simulations  indicate  that  the  performance  of  our  algorithm 
is  close  to  the  CRB  when  the  network  is  (close  to)  fully  connected,  in  the  sense  that  every  node 
can  estimate  its  distance  from  all  (most)  other  nodes.  Our  adaptation  of  Fastmap  also  turns  out  to 
make  a  big  difference  when  used  to  initialize  other  iterative  distributed  estimation  algorithms  that 
have  been  developed  specifically  for  sparse  networks. 

I.  Introduction 

The  problem  of  node  localization  from  pairwise  distance  estimates  has  recently  attracted  interest 
in  the  signal  processing  and  communications  literature  (e.g.,  [1],  [2],  [4],  [6]),  owing  to  the  recent 
interest  in  wireless  sensor  networks.  Given  a  matrix  of  pairwise  distances  (usually  estimated  using 
received  signal  strength  measurements  and  a  path  loss  model),  the  localization  problem  aims  to 

Submitted  to  IEEE  Trans,  on  Signal  Processing,  July  28.  2006.  Earlier  version  of  part  of  this  work  appeared  in 
conference  form  in  the  Proc.  of  IEEE  CAMSAP,  Dec.  12-14,  2005,  Puerto  Vallarta,  Mexico.  Supported  by  the  U.S.  ARO 
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July  28,  2006 


DRAFT 


2 


determine  the  ( relative )  node  locations  that  generate  these  distances.  In  other  words,  one  seeks  a  map 
of  sensor  locations  with  a  given  (approximate)  distance  structure.  This  is  a  classic  problem  originating 
in  psychometrics  [7],  [8],  known  as  Multi-Dimensional  Scaling  (MDS).  There  are  many  MDS  flavors 
and  variants;  perhaps  the  single  most  important  version  is  metric  MDS.  The  classical  approach  to 
solving  MDS  is  based  on  computing  the  principal  components  of  a  double-centered  version  of  the 
distance  matrix.  This  works  reasonably  well  (albeit  not  optimally,  due  to  the  double  centering),  but 
its  complexity  is  cubic  in  the  number  of  nodes,  and  thus  does  not  scale  well  with  network  size.  A 
popular  alternative  to  principal  component  analysis  (PCA)  is  the  use  of  gradient  descent  or  other 
numerical  optimization  tools  that  aim  to  optimize  a  stress  function.  The  stress  function  measures 
the  error  between  the  given  distances  and  those  reproduced  by  a  given  configuration  of  points.  The 
drawback  of  gradient  descent  and  related  approaches  is  that  they  require  accurate  initialization. 

We  propose  a  two-stage  MDS  algorithm  that  employs  an  algebraic  initialization  procedure  followed 
by  gradient  descent.  The  algebraic  initialization  is  based  on  the  Fastmap  [3]  algorithm,  borrowed  from 
the  database  literature.  Fastmap  is  a  linear-complexity  mapping  tool,  which  is,  however,  sensitive  to 
range  measurement  errors. 

Due  to  the  fact  that  distances  are  invariant  to  coordinate  frame  transformations  (rotation,  reflection, 
shift),  there  is  a  need  to  employ  three  so-called  anchor  nodes ,  whose  position  is  accurately  known 
(e.g.,  via  GPS)  in  order  to  fix  a  desired  coordinate  frame.  Unfortunately,  Fastmap  is  very  sensitive 
to  coordinate  alignment,  because  the  estimated  position  of  every  node  (and  thus  anchor  nodes  as 
well)  is  only  based  on  distances  to  selected  pivot  nodes  -  there  is  no  averaging.  In  order  to  mitigate 
this  problem,  we  advocate  a  judicious  choice  of  anchor/pivot  nodes,  placed  at  the  outer  edges  of  the 
network.  This  placement  bypasses  the  need  for  alignment  and  thus  alignment  errors,  thereby  providing 
a  high-quality  initialization  to  the  gradient  descent.  The  overall  algorithm  affords  better  localization 
accuracy  than  PCA-based  MDS,  at  substantially  lower  complexity  cost  (quadratic  in  the  number 
of  nodes).  Our  algorithm  is  also  competitive  with  respect  to  recent  low-complexity  solutions  (e.g., 
[2]),  especially  when  the  network  is  (close  to)  fully  connected.  Finally,  our  adaptation  of  Fastmap 
also  makes  a  big  difference  when  used  to  initialize  other  iterative  distributed  estimation  algorithms, 
specifically  developed  for  sparse  networks. 

The  rest  of  this  paper  is  structured  as  follows.  In  section  II  we  explain  in  detail  the  PCA-based 
MDS  algorithm,  and  the  standard  gradient  descent-based  MDS.  The  Fastmap  algorithm  is  briefly 
reviewed  in  section  III.  In  section  IV  we  describe  the  proposed  hybrid  algorithm,  while  in  section 
V  we  summarize  Costa’s  distributed  MDS  algorithm  [2],  Section  VI  presents  the  CRB  for  a  certain 
multiplicative  measurement  noise  model  that  is  often  adopted  in  the  literature  on  node  localization  in 
sensor  networks  [1],  [6],  Section  VII  contains  simulation  results  illustrating  the  performance  of  the 
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above  algorithms,  and  the  CRB.  We  remark  that  there  are  other  algorithms  in  the  recent  literature 
that  assume  a  different  measurement  model  (e.g.,  0-1  node  connectivity  only,  as  in  [6]),  or  propose 
solutions  of  considerably  higher  complexity  (e.g.,  as  in  [1]).  We  aim  for  the  low-complexity  regime, 
for  simplicity  and  scalability  considerations.  Conclusions  are  drawn  in  section  VIII. 


II.  Multidimensional  Scaling 

MDS  [7],  [8]  has  its  origins  in  psychometrics  and  psychophysics.  MDS  postulates  that  perceptual 
or  objective  “dissimilarities”  or  “distances”  between  pairs  of  abstract  “objects”  can  be  be  generated  by 
points  in  m-dimensional  space.  Any  set  of  distances  obeying  the  triangle  inequality  can  be  reproduced 
(or  closely  approximated)  by  choosing  m  to  be  sufficiently  large;  but  usually  m  =  2  or  m  =  3  is 
chosen  to  retain  the  systematic  variation,  and  also  for  ease  of  visualization.  Thus,  MDS  aims  to  find  a 
geometric  representation  of  the  data  in  2-D  or  3-D  space,  such  that  the  distances  between  data  points 
fit  as  well  as  possible  the  given  dissimilarity  information. 

We  denote  the  dissimilarity  measure  (the  estimated  distances  in  our  case),  between  objects  i  and  j  as 
dij.  The  set  of  dissimilarities  yields  a  measured  distance  matrix  D.  We  also  let  dl;j  denote  the  Euclidean 
distance  between  (generated  by)  two  points  Xt  =  ( xn,Xi2 ,  and  X3  =  (xji,Xj2,  ...,Xjm),  i.e. 

m 

dij  =  ,  xik  -  xjk)2.  (1) 

\  k= 1 

In  classical  metric  MDS,  we  estimate  the  node  coordinates  X  by  computing  the  m  principal 
components  of  a  double-centered  and  element-wise  squared  version  of  the  matrix  D,  denoted  by  B: 

B  =  -\  JPJ>  (2) 

where  P  =  D  ©  D  is  the  matrix  of  squared  distances  (©  denotes  the  element-wise  matrix  product), 
and  J  is  the  centering  operator, 

J  =  I  -  eeT /N,  (3) 


with  N  denoting  the  number  of  objects  (sensor  nodes),  and  e  denoting  the  N  x  1  vector  of  all  l’s  . 

For  an  N  x  N  matrix  D  and  for  m  dimensions,  it  can  be  shown  that 

N  N  N  N  m 

-\{dX  ■  yI4  -  +  ^EE4)  =  E  %ik%jki  (4) 

j= 1  2—1  j= 1  i=  1  k= 1 

thus  the  estimated  node  coordinates  are  given  by  the  m  principal  eigenvectors  of  the  matrix  B, 


scaled  by  the  square  roots  of  the  corresponding  eigenvalues.  That  is  ,  with  Ur  containing  the  m 

principal  eigenvectors  and  Vr  diagonal  containing  the  corresponding  eigenvalues,  Br  =  UrVrU(r 

1/2 

is  an  optimal  least  squares  approximation  of  B ,  and  Xr  =  UrVr  is  an  approximation  of  the  node 


coordinates  in  m-dimensional  space,  up  to  a  common  coordinate  rotation,  reflection,  and  shift.  An 
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alignment  procedure  is  necessary  to  transform  the  estimated  node  locations  to  a  desired  frame  of 
reference. 

It  is  important  to  note  that,  due  to  the  preprocessing  steps  prior  to  PCA,  this  approach  is  not 
equivalent  to  nonlinear  least-squares  parameter  fitting  using  the  original  measurements. 

Direct  minimization  of  a  suitable  stress  function  is  an  alternative  to  PCA-based  MDS  [7].  A 
common1  stress  function  is 

stress 2  =  ^ —  dij )2.  (5) 

i,j 

Where  [wl3]  is  the  weight  matrix,  whose  elements  are  equal  to  1  if  node  j  is  in  the  measurement 
range  of  node  i  and  0  otherwise.  Minimization  starts  with  an  initial  guess  of  the  node  positions  (often 
random),  followed  by  gradient  descent  iterations.  Initialization  matters  a  lot  in  this  context,  because 
the  stress  function  is  multi-modal.  Furthermore,  the  number  of  iterations  required  for  convergence 
depends  heavily  on  the  quality  of  the  initialization. 


III.  Fastmap 


The  basic  element  of  Fastmap  [3]  is  the  projection  of  the  nodes  on  a  properly  selected  line.  This 
is  achieved  by  selecting  two  objects  Oa,  Ob,  called  pivots,  and  projecting  all  other  objects  on  the  line 
that  passes  through  them.  A  pair  of  pivots  is  chosen  for  each  of  the  m  dimensions.  The  coordinates, 
(i.e.  projections  on  the  pivot  line)  of  the  objects  can  be  found  by  employing  the  cosine  law  [3].  Thus, 
the  first  coordinate  for  object  Oi  is  given  by: 


Xi  = 


(6) 


where  dij  is  the  dissimilarity  measure  between  nodes  i  and  j  and  a,  b  are  the  pivot  objects.  After 
computing  these  coordinates  for  each  object  Oi,  we  consider  a  hyperplane  which  is  orthogonal  to  the 
pivot  line.  We  then  project  the  objects  on  this  hyperplane,  and  repeat  the  process,  this  time  using 

djj  =  dfj  -  (xi  -  Xj)2,  i,j  =  1, ...,  N.  (7) 


A  heuristic  method  is  proposed  in  [3]  for  choosing  the  pivots  as  far  as  possible  from  one  another. 

In  database  applications  there  is  no  “natural”  or  preferred  coordinate  frame  of  reference,  thus  the 
final  alignment  step  is  not  used,  and  anchors  are  not  needed.  In  the  context  of  sensor  networks, 


'The  negative  log-likelihood  of  the  observed  data  under  a  suitable  measurement  noise  model  would  seem  to  be  the 
natural  choice  of  stress  function.  This  is  not  fortuitus  in  our  context,  however,  because  the  resulting  function  is  not  only 
multi-modal,  but  also  leads  to  numerical  difficulties.  For  this  reason,  a  least  squares  criterion  is  preferred.  While  still  multi¬ 
modal,  the  adopted  least  squares  criterion  is  much  more  benign  front  a  numerical  optimization  viewpoint,  and  it  often  yields 
performance  close  to  the  pertinent  CRB.  as  will  be  seen  in  the  simulations. 
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however,  obtaining  absolute  position  estimates  is  important.  Unfortunately,  Fastmap  is  very  sensitive 
to  coordinate  alignment,  because  the  estimated  position  of  every  node  (and  thus  anchor  nodes  as  well) 
is  only  based  on  distances  to  the  chosen  pivot  nodes  -  there  is  no  averaging.  In  order  to  mitigate 
this  problem,  we  advocate  a  particular  choice  of  anchor/pivot  nodes,  placed  at  the  outer  edges  of 
the  network.  In  particular,  we  assume  that  the  sensor  nodes  are  spread  over  a  square,  and  place  the 
anchor  nodes,  which  will  also  serve  as  pivots,  at  three  vertices  (see  Fig.  1).  This  placement  bypasses 
the  need  for  alignment  and  thus  alignment  errors,  thereby  providing  a  high-quality  initialization  to 
the  gradient  descent.  Anchors  #1  and  #2  also  serve  as  pivots  for  determining  the  coordinates  in  the 
first  dimension,  while  anchors  #2  and  #3  double  as  pivots  for  the  second  dimension. 

We  assume  that  the  anchor/pivot  nodes  which  arc  used  by  the  Fastmap  can  take  distance  measure¬ 
ments  from  all  the  sensor  nodes,  (even  if  we  don’t  have  full  connectivity  information  for  the  rest  of 
the  nodes).  This  is  reasonable  if  the  anchor/pivot  nodes  are  airborne  or  in  higher  ground. 

IV.  A  Two- stage  Approach 

Fastmap  is  a  fast  algebraic  mapping  method  that  is  rather  sensitive  to  measurement  errors,  partic¬ 
ularly  so  in  the  final  alignment  step.  In  our  context,  this  sensitivity  can  be  mitigated  by  the  proposed 
choice  of  anchor/pivot  nodes.  The  resulting  estimates  can  be  used  as  initialization  for  gradient  descent. 
Each  step  of  gradient  descent  costs  0(N2).  Assuming  good-enough  initialization,  only  a  few  gradient 
descent  steps  will  be  needed.  This  suggests  that  a  substantial  complexity  reduction  relative  to  PCA 
and  other  techniques  is  possible.  Interestingly,  estimation  accuracy  can  be  improved  as  well,  as  we 
will  see. 

The  basic  steps  of  the  two-stage  algorithm  are  shown  in  Table  I.  Denoting  by  (xj,  yt )  the  estimated 
position  of  node  i,  the  partial  derivative  of  the  stress  function  in  (5)  is  given  by 

(8) 

with  a  similar  expression  for  the  partial  derivative  with  respect  to  yt.  For  simplicity,  but  also  to  bound 
complexity,  a  fixed  number  p  =  10  of  gradient  descent  steps  is  used  in  our  simulations. 

V.  Costa’s  Algorithm 

An  iterative  distributed  estimation  algorithm  for  MDS  has  been  recently  proposed  in  [2],  using  the 
principle  of  majorization.  The  idea  behind  majorization  is  simple.  Instead  of  directly  minimizing  a 
complicated  cost/stress  function,  majorization  uses  a  simpler  (usually  quadratic)  majorizing  function 
that  lies  over  the  said  cost/stress  function  and  is  equal  to  it  at  the  current  parameter  estimate. 
Minimizing  the  majorizing  function  thus  yields  a  new  parameter  estimate  whose  cost/stress  is  lower 


dstress 

dxi 


(V( Xj  -  Xj )2  +  (yi  -  Vjf  -  dij){xi  -  Xj) 
\ /(xi  -  Xj)2  +  ( Vi  -  Vj )2 
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than  or  equal  to  that  of  the  previous  one.  Continuing  in  this  fashion  yields  a  sequence  of  parameter 
estimates  of  decreasing  cost/stress  values.  Specializing  to  the  present  context  [2]  yields  the  following 
update 

Xifc  =  aiXfc"1bJ-S  (9) 


where  XA:  is  the  matrix  which  contains  the  position  estimates  for  all  the  sensor  nodes  in  the  kth 

iteration  of  the  algorithm,  and  a  is  a  parameter  given  by 

N—M  N 

ail  =  E  Wii  +  E  2*%>  (10) 

j=l,j^i  j=N-M+ 1 

where  M  is  the  number  of  anchor  nodes  ( M  =  3  in  the  2-D  case).  The  entries  of  the  N  x  1  vector 
bj  are  given  by 


(j  )  —  'Wij  ( 1  dij  /  dij  ) , 

j  <N  -  M,j  /  i, 

=  2w;^(l  dij  /  dij ), 

j  >  N  -  M,j  /  i 

N-M 

N 

^  ^  Wijdij  / dij  + 

^  ^  2  Wij  dij  /  dij , 

i= 1  j=N—M+ 1 

where  d,rJ  is  the  reproduced  distance  computed  from  the  coordinate  estimates  at  iteration  k.  The 
algorithm  runs  iteratively  and  the  requisite  computation  can  be  performed  at  each  node  in  a  distributed 
function  (every  node  computes  its  own  position  coordinates  and  the  corresponding  part  of  the  cost 
function).  The  iterations  continue  until  the  associated  sequence  of  costs  converges  within  e  in  the 
Cauchy  sense.  The  cost  function  which  the  authors  in  [2]  propose  is: 


N-M 

s=  E  5*> 

i— 1 


(12) 


where  the  local  cost  functions  St  arc  given  by: 


N—M  N 

Si  —  ^  '  Wij(dij  djj )  -f-  ^  ^  2wij(dij  dij )  (13) 

3= l,j¥=i  j=N—M+l 


When  the  difference  between  the  previous  and  the  cuiTent  cost  values  becomes  smaller  than  a  threshold 
e  the  algorithm  terminates.  This  is  guaranteed  due  to  the  fact  that  a  single  iteration  can  reduce  or 
maintain,  but  cannot  increase  the  cost,  which  is  also  bounded  from  below. 


VI.  Measurement  Noise  Model  and  Cramer-Rao  Bound 

Pairwise  distance  estimates  will  inevitably  contain  measurement  errors,  which  are  generally  am¬ 
plified  with  increasing  distance  between  nodes.  The  choice  of  measurement  noise  model  depends  on 
many  factors,  and  is  application-specific.  We  shall  adopt  a  certain  multiplicative  noise  model  from 
the  recent  literature  on  node  localization  in  wireless  sensor  networks  [1],  [6],  in  which  the  distance 
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measurement  error  is  proportional  to  the  actual  distance  between  the  pair  of  nodes.  Thus  the  measured 
distance  dij  between  nodes  i.  j  is  assumed  to  be  drawn  from 

d ij  ~  dij  +  SijAf(0,  ex),  (14) 


where  d7J  is  the  actual  distance  between  nodes  i,j  and  e2  is  the  range  error  variance.  We  also  assume 
that  the  measurements  are  reciprocal  (or  symmetrized  by  averaging  prior  to  further  processing);  i.e., 

dij  —  d,ji . 

In  this  section,  we  derive  the  Cramer-Rao  Bound  (CRB)  for  node  localization  using  the  above 
multiplicative  noise  model.  Analogous  derivations  for  different  noise  models  employed  in  [2],  [5]  can 
be  found  in  [5].  An  explanation  of  the  difference  between  the  RSS  noise  model  described  therein 
and  our  multiplicative  noise  model  can  be  found  in  the  appendix. 

Define  the  vector  of  sensor  parameters  7  =  (7172  - -Tat)  -  Each  7 i  contains  the  location  coordinates 
for  node  i,  i.e.,  7 i  =  (.7,  yt] )  in  the  2-D  case.  The  unknown  parameter  vector  for  the  N  —  3 
sensors  whose  locations  are  unknown2  is  defined  as  6  =  ( 6X  Qy),  with  Qx  =  (x\,  X2,  ...,  T/v-3) 
and  Qy  =  (y\.  7/2,  ■  Vn-h)-  This  is  the  vector  we  wish  to  estimate.  Sensors  i,j  perform  pairwise 
observations  dij.  We  assume  that  the  observations  arc  statistically  independent  for  i  <  j.  The 
density  function  of  the  observations  dij  given  the  locations  of  nodes  i,  j  is  denoted  by  /(dy  I71,  7^). 
Thus  the  joint  log-likelihood  is 

N 

K  D,7)  =  E  E  ^ 

i= 1  j<i  (15) 

k,j  =  logf(dij\'n,'yj) 


where  H  (i)  is  the  set  of  nodes  which  are  in  the  range  of  node  i. 

The  CRB  for  coordinate  6i  is  co v(0i)  >  [F^-1]^,  where  Fg  is  the  Fisher  Information  Matrix  (FIM), 
given  by 


Fxx  FXy 
^xy  -^yy 


(16) 


The  elements  for  the  sub-matrix  Fxx  are  given  by 


Fxx(M) 


J2jeH(k)  E\dxl^,j]i  k  —  l 
-lH(k)(})E[QXkQx j  1  k  /  l 


(17) 


2In  the  2-D  case  we  need  3  anchor  nodes. 
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where  In(k)( 0  is  the  indicator  function  (1  if  l  is  in  the  range  of  k,  0  otherwise).  Similar  expressions 
hold  for  the  FXy,Fyy  sub-matrices.  For  full  connectivity,  the  elements  of  the  above  matrices  are 

E2(xk-XjY  1  e-l+1  (  1  ,  A  (xk-Xj)2  \  1/1  q  {xk  XjY  \  U—l 

j  6*.  5L  e?  V  SI,  Si,  >  el^SL  Hi  i’  K~h 


F  XX(M)  =  < 


2  ( Xj  )■  1 

^kj  ukj 


(  1  r,(Xk~Xl)2  ,  l+e2fA  (Xk-Xl)2  1  1  /Q  (Xk—Xl)2  1  ^  U  -L  1 

\d2kl  Z  rt4.  "t"  P.2  Sf.  S?J  f2\0  S2.  S'!.))-  K  T1  1 


Wt ' 


si  $li ' 


(18) 


(similar  expressions  can  be  obtained  for  the  elements  of  Fyy)  and 

-  E,  2(i*  -  Xj)(yt  -n)Y-  +  3  (-.-*$*-»>,  k  =  (, 


FXX(M)  =  < 


ukj 


ukj 


~(-2(zk  -  XiXtnt  -  yi)£  +  4^*>-x$»-*>  -  k  /  l 


(19) 


VII.  Simulation  Results 

In  this  section,  we  compare  the  aforementioned  algorithms  in  the  context  of  node  localization  in 
sensor  networks.  Network  nodes  are  considered  to  be  uniformly  distributed  in  a  square  with  area 
equal  to  1,  i.e.,  the  x  and  y  coordinates  of  the  sensor  nodes  are  uniformly  distributed  in  [0, 1],  We 
employ  the  alignment  procedure  described  in  [4],  when  necessary,  in  order  to  estimate  the  absolute 
coordinates,  and  adopt  root  mean  squared  error  as  our  estimation  performance  metric: 

RAISE  ■=  ^ ^ Xri  Xei )2  ^ ri  ~  ^20) 

N  ’  K 

where  xei,yei  are  the  estimated  coordinates,  and  xrt,  yri  arc  the  actual  coordinates  of  sensor  i.  The 
computational  complexity  orders  of  the  various  algorithms  under  consideration  are  listed  in  Tables  II 
and  IV,  for  the  case  of  full  and  partial  connectivity,  respectively. 

The  baseline3  MDS  algorithm  is  based  on  PCA  of  the  doubly-centered  matrix  of  squared  distances, 
and  henceforth  referred  to  as  PCA-based  MDS.  We  also  implemented  Costa’s  iterative  majorization 
algorithm.  We  tried  both  a  random  initialization  and  the  alternative  initialization  strategy  suggested 
in  [2],  The  latter  strategy  often  yields  complex  coordinates  when  the  triangle  inequality  fails  due  to 
measurement  errors,  whereas  the  former  (random)  yields  unsatisfactory  results  that  do  not  improve 
with  decreasing  error  variance.  It  is  clear  that  Costa’s  algorithm  is  sensitive  with  respect  to  initializa¬ 
tion,  and  could  benefit  from  a  better  “warm  start”.  For  this  reason,  we  also  tried  using  our  adaptation 
of  Fastmap  to  initialize  Costa’s  iteration. 

Fig.  2  shows  the  RMSE  performance  of  the  various  algorithms  (PCA,  Fastmap,  Fastmap+SD, 
Fastmap+Costa,  and  Costa  with  random  initialization)  for  a  sensor  network  with  80  sensors,  as  a 

3PCA-based  MDS  is  not  directly  applicable  in  the  case  of  partial  connectivity. 
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function  of  e2r.  Distance  measurements  are  drawn  from  the  multiplicative  noise  model  in  (14).  The 
corresponding  Cramer-Rao  Bound  (CRB)  is  also  plotted  as  a  benchmark.  For  the  SD  step  of  the 
proposed  algorithm  (Fastmap+SD),  a  step-size  of  A  =  0.01  and  p  =  10  SD  iterations  were  used. 
The  convergence  threshold  in  Costa’s  algorithm  was  set  to  e  =  0.1.  From  Fig.  2,  we  observe  that 
stand-alone  Fastmap  exhibits  poor  performance,  which  quickly  degrades  with  increasing  range  error 
variance.  When  randomly  initialized,  Costa’s  algorithm  also  performs  poorly  in  this  setup,  and  its 
performance  does  not  improve  with  decreasing  error  variance.  Fastmap+SD  and  Fastmap+Costa  are 
the  best  options  from  the  viewpoint  of  RMSE  performance,  and  remain  relatively  close  to  the  CRB, 
especially  for  low  range  error  variance.  Interestingly,  the  proposed  algorithm  is  not  only  less  complex, 
but  also  more  accurate  than  PCA.  This  is  partially  attributed  to  the  fact  that  PCA  uses  double  centering, 
which  colors  the  noise,  whereas  the  proposed  algorithm  directly  minimizes  the  stress  function. 

Fig.  3  shows  corresponding  results  for  a  network  with  200  nodes  (A  =  0.005;  the  remaining  setup  is 
the  same  as  Fig.  2).  The  estimation  accuracy  of  PCA,  Fastmap+SD,  and  Fastmap+Costa,  is  improved 
relative  to  Fig.  2,  as  expected.  Fastmap  does  not  benefit,  due  to  the  lack  of  (implicit  or  explicit) 
averaging,  while  Costa’s  algorithm  with  random  initialization  actually  does  quite  the  same  as  in  Fig. 
2. 

We  also  tried  an  additive  measurement  noise  model,  i.e.,  the  measurements  are  drawn  from 

dij  ~  Sij  +  A^(0,  eji),  (21) 

where  the  variance  of  the  measurement  error  is  independent  of  the  distance  between  the  two  nodes. 
The  results  are  shown  in  Fig.  4  for  the  case  of  80  nodes,  and  in  Fig.  5  for  the  case  of  200 
nodes.  We  observe  again  that  Fastmap+SD  and  Fastmap+Costa  yield  approximately  the  same  RMSE 
performance,  significantly  outperforming  stand-alone  Fastmap  and  PCA. 

One  might  also  wonder  whether  the  RMSE  comparison  of  the  various  algorithms  is  sensitive  with 
respect  to  the  statistics  of  the  multiplicative  noise  (normal  versus  log-normal,  see  also  the  appendix). 
Fig.  6  presents  simulation  results  for  the  log-normal  multiplicative  noise  model  employed  in  [2],  We 
observe  that  the  relative  performance  ordering  of  the  different  algorithms  is  the  same  as  in  Fig.  2. 

Fig.  7  shows  the  average  computational  cost  in  floating  point  operations  (FFOPS)  of  Fastmap+SD 
and  Fastmap+Costa,  as  a  function  of  the  number  of  nodes,  N.  We  observe  that  Fastmap+SD  exhibits 
significantly  lower  complexity  (almost  five  times  lower)  than  Fastmap+Costa.  The  values  of  the  step- 
size  A  used  for  the  different  values  of  N  arc  listed  in  Table  III. 

In  all  simulation  results  presented  so  far,  the  network  was  assumed  to  be  fully  connected,  i.e., 
distance  measurements  were  available  for  each  pair  of  nodes  in  the  network.  We  now  switch  to  partially 
connected  scenarios.  We  assume  that  nodes  which  arc  further  apart  than  a  certain  threshold  (radio 
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range)  cannot  hear  each  other,  the  corresponding  distance  measurement  is  marked  as  unavailable,  and 
the  associated  weight  in  the  stress  function  is  set  to  zero.  An  exception  is  that  every  node  is  assumed 
to  be  within  range  from  each  of  the  three  anchor/pivot  nodes.  We  adopt  the  multiplicative  noise  model 
in  14,  and  consider  two  cases:  in  the  first  the  measurement  range  is  0.14  and  in  the  second  it  is  0.3. 
Fig.  8  and  Fig.  9  show  the  RMSE  performance  of  Fastmap+SD,  Fastmap+Costa,  and  the  CRB  (which 
accounts  for  the  missing  data)  for  the  two  cases,  as  a  function  of  range  error  variance,  for  Ar  =  80 
nodes.  Table  V  lists  the  values  of  A  used  in  the  SD  iteration  for  the  three  different  connectivity 
scenarios  (fully  connected,  partially  connected  with  measurement  range  equal  to  0.3,  or  0.14)  and 
N  =  80.  For  Fastmap+SD,  we  tried  two  different  values  for  the  number  of  SD  iterations:  p  =  10  and 
p  =  30.  From  Fig.  8  and  Fig.  9,  we  observe  that  Fastmap+Costa  outperforms  Fastmap+SD  in  terms 
of  RMSE,  even  when  p  =  30  is  used  in  SD.  This  is  in  contrast  to  the  case  of  full  connectivity.  The 
corresponding  FLOP  counts  in  Fig.  10  and  Fig.  11  show  that  Fastmap+SD  with  p  =  10  maintains  its 
computational  complexity  advantage  compared  to  Fastmap+Costa.  Increasing  p  improves  the  RMSE 
performance  of  Fastmap+SD,  but  at  the  cost  of  computational  complexity,  which  is  brought  closer  to 
that  of  Fastmap+Costa.  We  conclude  that  while  Fastmap+SD  offers  lower  complexity  for  the  same 
RMSE  performance  as  Fastmap+Costa  in  the  fully  connected  case,  there  is  a  performance  penalty  for 
the  reduced  complexity  in  the  partially  connected  case,  wherein  Fastmap+Costa  may  be  preferable. 

VIII.  Conclusions 

We  have  proposed  a  hybrid  two-stage  node  localization  algorithm  that  offers  better  accuracy  than 
existing  alternatives  of  the  same  (and,  in  certain  cases,  even  higher)  complexity  order.  The  new 
algorithm  employs  Fastmap,  coupled  with  judicious  selection  of  anchor  nodes  that  double  as  pivots, 
to  generate  a  computationally  cheap  yet  sufficiently  accurate  initialization  for  gradient  descent.  The 
new  algorithm  is  particularly  attractive  (in  terms  of  the  offered  performance-complexity  trade-off)  in 
the  case  of  dense  networks. 

We  also  proposed  using  our  adaptation  of  Fastmap  as  initialization  for  Costa’s  algorithm.  The  latter 
combination  appears  useful  for  sparse  networks,  in  which  case  it  attains  better  estimation  performance 
than  Fastmap+SD,  albeit  at  a  higher  complexity  cost.  Our  simulations  indicate  that,  in  the  context 
of  our  present  application,  Fastmap+SD  uniformly  outperforms  PCA-based  MDS,  both  in  terms  of 
complexity  and  in  terms  of  estimation  accuracy.  We  have  also  derived  the  pertinent  CRB  for  the 
multiplicative  noise  model  in  [1],  [6],  which  was  adopted  for  most  of  our  simulations. 

Appendix 

Normal  vs.  log-normal  multiplicative  noise  modelling:  In  [2],  [5],  the  power  received  at  node  i 
from  node  j,  measured  in  decibel  (dB),  is  modelled  as  Pij  =  P^  +  v,  where  Pij  is  the  mean  power, 
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and  v  is  a  zero-mean  Gaussian  random  variable  of  standard  deviation  a.  The  mean  power  is  modelled 
as  Pij  =  P()  —  lOriplogioj^-,  where  Po  is  the  mean  power  for  a  reference  distance,  <5o,  and  np  is  the 
path  loss  exponent.  It  follows  that 

S' 

Po  -  Pij  =  Po~  Pij  -  v  =  lOriplogio -  v,  (22) 

ho 


and  the  associated  distance  estimate  is  given  by  [2] 


dij  =  50 10(p°-p«)/1On».  (23) 

Substituting  Pij  =  Pij  +  v  and  P,j  =  Pq  —  10nplogioj^  yields 

dij  =  5ijlO~v/10n”.  (24) 


Notice  that  the  noise  factor  is  log-normal ,  whereas  in  the  model  of  [1],  [6]  (also  adopted  herein)  the 
noise  factor  is  normally  distributed. 
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TABLE  I 

Two-stage  Fastmap+SD  Algorithm 

input:  D 

1)  Run  Fastmap  using  as  pivots  three  anchor  nodes,  judi¬ 
ciously  placed  on  the  three  vertices  of  the  square  distri¬ 
bution  area.  Let  X  be  the  vector  containing  the  resulting 
estimated  node  coordinates. 

2)  For  i  =  1  to  p 
begin 

•  evaluate  V stress  at  the  point  X 

•  X  =  X  —  AV  stress 
end 


TABLE  II 

Computational  complexities  for  full  connectivity  (N  is  number  of  nodes,  m  is  number  of  spatial 

DIMENSIONS) 


Algorithm 

Complexity 

Fastmap 

Fastmap+SD 

PCA 

Costa’s 

0(mN) 

0(pmN2),  p«N 
0(N3) 

(D(kmN2),  k  «  N 

TABLE  III 

Choice  of  step-size  A  as  a  function  of  the  number  of  nodes  N 


N 

A 

80 

0.01 

110 

0.0075 

140 

0.007 

170 

0.006 

200 

0.005 
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TABLE  IV 

Computational  complexities  for  partial  connectivity  (s  is  the  average  number  of  distance 

MEASUREMENTS  COLLECTED  BY  A  NODE) 


Algorithm 

Complexity 

Fastmap 

Fastmap+SD 

Costa’s 

0(mN) 

(D(pmsN),  p  «  N 

O(kmsN),  k«N 

TABLE  V 

Choice  of  step-size  A  as  a  function  of  measurement  range 


Measurement  Range 

A 

Infinite 

0.01 

0.3 

0.0125 

0.14 

0.015 

Fig.  1.  Anchor/pivot  node  placement 
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Fig.  2.  RMSE  performance  vs.  measurement  range  error  variance.  N  =  80,  all  pairwise  distance  estimates  collected. 
Measurement  error  proportional  to  the  actual  distance.  100  Monte  Carlo  runs. 


0  0.02  0.04  0.06  0.08  0.1  0.12  0.14  0.16  0.18  0.2 

power  noise  variance 


Fig.  3.  RMSE  performance  vs.  measurement  range  error  variance.  N  =  200,  all  pairwise  distance  estimates  collected. 
Measurement  error  proportional  to  the  actual  distance.  100  Monte  Carlo  runs. 
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Fig.  4.  RMSE  performance  vs.  measurement  range  error  variance.  N  =  80,  additive  noise  measurement  model,  all  pairwise 
distance  estimates  collected.  100  Monte  Carlo  runs. 


Fig.  5.  RMSE  performance  vs.  measurement  range  error  variance.  N  =  200  sensor  nodes,  all  pairwise  distance  estimates 
collected.  Additive  noise  measurement  model.  100  Monte  Carlo  runs. 
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Fig.  6.  RMSE  performance  vs.  power  noise  variance  a2.  N  =  80  sensor  nodes,  all  pairwise  distance  estimates  collected. 
Log-normal  noise  measurement  model.  100  Monte  Carlo  runs. 


Fig.  7.  Computational  cost  in  FLOPS  vs.  number  of  nodes.  All  pairwise  distance  estimates  collected,  e 2  =  0.1,  e  =  0.1. 
Multiplicative  noise  measurement  model.  50  Monte  Carlo  runs. 
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Fig.  8.  RMSE  performances  and  CRB  for  limited  measurement  range  =  0.14  (the  weights  which  correspond  to  distances 
greater  than  this  limit  are  set  to  zero),  e  =  0.1,  A  =  0.015,  N  =  80.  100  Monte  Carlo  runs. 


Fig.  9.  RMSE  performances  and  CRB  for  limited  measurement  range  =  0.3.  e  =  0.1,  A  =  0.013,  N  =  80.  100  Monte 
Carlo  runs. 
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Fig.  10.  Computational  cost  in  FLOPS  vs.  number  of  nodes.  Pairwise  distances  collected  only  for  nodes  with  actual 
distance  smaller  than  0.3.  =  0.1.  Multiplicative  measurement  noise  model.  50  Monte  Carlo  runs. 


Fig.  11.  Computational  cost  in  FLOPS  vs  number  of  nodes.  Pairwise  distances  collected  only  for  nodes  with  actual  distance 
smaller  than  0.14.  eji  =  0.1.  Multiplicative  measurement  noise  model.  50  Monte  Carlo  runs. 
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A  Semidefinite  Relaxation  Approach  to  MIMO 
Detection  for  High-order  QAM  Constellations^ 

Nicholas  D.  Sidiropoulos1 ,  Zhi-Quan  Luo2 


Abstract 

A  new  and  conceptually  simple  semidefinite  relaxation  approach  is  proposed  for  MIMO  detection  in 
communication  systems  employing  high-order  QAM  constellations.  The  new  approach  affords  improved 
detection  performance  compared  to  existing  solutions  of  comparable  worst-case  complexity  order,  which 
is  nearly  cubic  in  the  dimension  of  the  transmitted  symbol  vector  and  independent  of  the  constellation 
order  for  uniform  QAM,  or  affine  in  the  constellation  order  for  non-uniform  QAM. 
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I.  Introduction 

Maximum  likelihood  (ML)  detection  in  memoryless  Multiple-Input  Multiple-Output  (MIMO)  commu¬ 
nication  systems  with  Gaussian  noise  is  equivalent  to  a  least-squares  lattice  search  problem  which  is 
NP-hard.  For  this  reason,  several  computationally  efficient  approximate  solutions  have  been  developed. 
The  current  state-of-art  includes  two  main  families  of  high-performance  MIMO  detectors:  those  based 
on  Sphere  Decoding  (SD)  [11],  [1],  [2],  [12],  [14]  and  those  based  on  Semidefinite  Relaxation  (SDR) 
[7],  [6],  [5],  [13].  SD  detectors  can  provide  the  exact  ML  solution  at  low  computational  cost,  provided 
that  the  Signal  to  Noise  Ratio  (SNR)  is  relatively  high,  and  the  aggregate  transmission  rate  is  relatively 
low.  However,  SD  cannot  efficiently  handle  high  problem  dimensions  (long  symbol  vectors)  or  high-order 
symbol  constellations,  especially  at  low  SNR,  and  it  has  recently  been  shown  that  its  expected  complexity 
is  exponential  [4],  under  certain  conditions  that  are  relatively  mild  and  general  in  our  context.  Worst-case 
complexity  of  computing  the  exact  ML  solution  is  generically  exponential,  due  to  NP-hardness. 

In  contrast,  SDR  approaches  feature  polynomial  worst-case  complexity  and  very  competitive  per¬ 
formance.  Initially,  SDR  multiuser  /  MIMO  detection  was  developed  for  Binary  Phase-Shift  Keying 
(BPSK)  constellations,  but  the  ideas  were  later  extended  to  M-PSK  [7],  [6],  [5],  and,  very  recently, 
to  16-  Quadrature  Amplitude  Modulation  (16-QAM)  [13]  and  general  QAM  constellations  [8].  While 
[13]  deals  exclusively  with  16-QAM,  the  approach  can,  in  principle,  be  extended  to  higher-order  QAM 
alphabets.  This,  however,  entails  the  introduction  of  additional  slack  variables,  and  complexity  becomes 
0(/x6'5iV6-5),  where  N  =  O(M),  M  is  the  number  of  symbols,  and  K  is  the  square  root  of  the  order 
of  the  constellation.  The  idea  in  [13]  is  fruitful  for  16-QAM,  but  impractical  for  higher  orders.  Likewise, 
the  complexity  of  the  methods  in  [8]  ranges  from  0(iv6  -5Ar4)  to  (){Ki>J> 

In  this  contribution,  we  propose  a  different,  0(N3-5)  relaxation  for  high-order  QAM  alphabets.  Our 
approach  can  be  viewed  as  further  relaxation  of  [13],  only  utilizing  upper  and  lower  bounds  on  the 
symbol  energy  in  the  relaxation  step.  The  key  features  of  our  approach  arc  that  i)  it  provides  significant 
performance  improvements  relative  to  existing  solutions  of  comparable  worst-case  complexity  order; 
and  ii)  its  complexity  is  independent  of  the  constellation  order  for  uniform  QAM,  and  affine  in  the 
constellation  order  for  non-uniform  QAM.  For  BPSK  and  4-QAM,  our  approach  reduces  to  the  one  in 

m. 
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II.  Problem  Statement  and  Preliminaries 

For  any  separable  QAM  constellation1,  ML  detection  in  memoryless  MIMO  communication  systems 
with  Gaussian  noise  can  be  formulated  as  the  following  optimization  problem  (possibly  after  noise  pre¬ 
whitening): 

min  ||d  —  Ms|||  (1) 

subject  to:  7?e{s(i)}  £  Areal,  Im{s(i)}  £  Aimag ,  Vi.  (2) 

For  brevity  of  exposition,  we  will  assume  that  Areai  =  Aimag  =  A  in  the  sequel,  although  our  approach 
generalizes  trivially  to  different  alphabets  for  the  real  and  imaginary  parts.  We  thus  consider 

min||d  —  Ms|||  (3) 

subject  to:  i?e{s(i)}  £  A,  Im{s(i)j  £  A.  Vi,  (4) 

where  d  is  the  complex  baseband  received  vector,  M  is  a  known  baseband-equivalent  channel  matrix, 
and  s  is  the  symbol  vector.  Upon  defining 

z  :=  Re  {d}T  Im  {d}T  ,  (5) 

Re  {M}  -Im{ Mj 

H :=  1  1  (6) 

Im{  M}  Re{  M} 

r  -|  t 

r  :=  Re{s}T  Im  {s}T  ,  (7) 

we  may  convert  the  problem  to  real-valued  form 

min||z  — Hr|||  (8) 

subject  to:  r(i)  £  A,  Vi.  (9) 

III.  Proposed  solution 

Assume  that  A  is  symmetric  about  the  origin  (always  the  case  for  QAM  constellations).  In  this  case, 
if  r  satisfies  the  finite  alphabet  constraints  in  (9),  then  so  does  tr,  for  t  £  {—1, 1}.  Furthermore, 

||z  —  Hr|  ||  =  rTHTHr-2zTHr  +  zTz.  (10) 

'Separable  constellations  are  almost  always  adopted  for  ease  of  decoding,  even  in  the  single-input  single-output  case. 
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It  follows  that  (8)-(9)  is  equivalent  to 

min  (rTHTHr  -  2zTHfr) 
subject  to:  r(i)  G  A.  Vi,  t  G  {—1, 1}  . 

Further  defining  x  :=  |~  rT  t  j  G  and 

HtH  -Htz 
-ztH  0 

problem  ( 1 1)-(  12)  can  be  put  in  homogeneous  quadratic  form 

min  x  v  Qx 

subject  to:  x(i)  G  A.  Vi  G  {!,•••  ,N-1},  x(N)  G  {  —  1, 1}  • 


(11) 

(12) 

(13) 

(14) 

(15) 


Using  x1  Qx  =  Trace(xi  Qx)  =  TraccfQxx7),  and  denoting  X  :=  xx7 ,  we  can  rewrite  problem 
(14)-(15)  equivalently  as: 


minTrace(QX)  (16) 

subject  to:  X  >  0,  rank(X)  =  1,  (17) 

X(i,  i)  G  A2,  Vi  G  {1,  •  •  •  ,  N  -  1}  ,  X{N,  N )  =  1.  (18) 


Problem  ( 1 6)-(  1 8)  entails  nonconvex  constraints:  the  rank(X)  =  1  constraint,  as  well  as  the  finite 
(squared)  alphabet  constraints  X(i,i)  G  A2,  Vi  G  {1,  •  •  •  ,  Ar  —  1 }.  Dropping  the  rank-one  constraint, 
and  relaxing  the  constraints  X(i,  i)  G  A2,  Vi  G  {1,  •  •  •  ,  N  —  1}  to  the  convex  half-space  constraints 
L  :=  minag_4a2  <  X(i,i)  <  maxng^  a2  =:  U,  Vi  G  {1,  •  ■  •  N  —  1},  we  obtain  the  following  convex 
relaxation: 


minTrace(QX)  (19) 

subject  to:  X  >  0,  (20) 

L  <  X(i,  i)  <  U,  Vi  G  {1,  •  •  •  ,  N  -  1}  ,  X(JV,  N)  =  1.  (21) 

Note  that  (19)-(21)  is  not  a  Lagrangian  relaxation  of  ( 1 6)-(  1 8),  because,  in  addition  to  the  rank-one 
constraint,  we  have  relaxed  the  alphabet  constraints.  This  means  that  the  bi-dual  interpretation  does  not 
hold  for  our  relaxation  in  (19)-(21).  For  a  bi-dual  relaxation  see  [13],  Our  proposed  relaxation  in  (19)-(21) 
can  be  viewed  as  further  relaxation  of  [13],  and  it  affords  lower  complexity  for  large  \A\  compared  to 
[13]. 
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The  relaxed  problem  in  (19)-(21)  can  be  solved  using  any  of  the  available  modern  SDP  solvers,  such 
as  SeDuMi  [10],  based  on  interior  point  methods.  After  this  step,  an  approximate  solution  to  the  original 
problem  can  be  generated  using  Gaussian  randomization :  that  is,  drawing  random  vectors  x  ~  jV(0,  XQ), 
where  X0  denotes  the  solution  of  (19)-(21),  quantizing  each  element  of  x  to  the  nearest  point  in  A, 
reconstructing  s  from  the  quantized  x,  and  picking  the  s  that  yields  the  smallest  cost  in  (3). 

A.  Complexity 

The  worst-case  complexity  of  solving  a  generic  SDP  problem  involving  a  matrix  variable  of  size  NxN 
and  O(N)  linear  constraints  is  0(Ne-5).  That  would  imply  a  complexity  of  0(N 6-5)  for  problem  (19)- 
(21).  However,  exploiting  the  fact  that  the  constraints  in  (21)  are  separable  and  only  apply  to  the  diagonal 
elements  of  X,  that  figure  can  be  reduced  to  0(JV3-5),  which  is  very  competitive  ( N  =  2 M  +  1,  where 
M  is  the  number  of  QAM  symbols).  The  complexity  of  the  randomization  step  is  0(N2)  per  draw.  We 
emphasize  that,  unlike  [13],  the  complexity  of  the  overall  algorithm  is  independent  of  the  constellation 
order  for  uniform  QAM,  and  affine  in  the  constellation  order  for  non-uniform  QAM.  This  is  because 
the  quantization  step  in  the  randomization  loop  amounts  to  simple  scaling  and  rounding  for  uniform 
constellations,  but  may  require  a  linear  search  for  non-uniform  constellations. 

IV.  Simulations 

We  conducted  Monte-Carlo  (MC)  simulation  experiments  for  two  indicative  MIMO  transmission 
scenarios:  a  16  x  16  system  using  64-QAM,  and  an  8  x  8  system  using  16-QAM.  In  both  cases, 
the  channel  matrix  comprised  i.i.d.  elements  drawn  from  a  circularly  symmetric  zero-mean  complex 
normal  distribution  of  unit  variance  (CAA(0, 1)),  and  a  new  channel  realization  was  drawn  for  each  vector 
transmission  (MC  trial).  The  signal  to  noise  ratio  is  defined  as  SNR  :=  lO/or/m  Ay';  ,  where  M  is  the 
length  of  the  transmitted  QAM  symbol  vector  s,  Es  is  the  mean  symbol  energy  of  the  QAM  constellation, 
and  the  noise  vector  is  i.i.d.  CAf(0,  N0). 

In  order  to  gauge  performance  as  a  function  of  the  number  of  randomizations,  we  tested  our  SDR  algo¬ 
rithm  with  100,  300,  and  1000  randomization  samples  per  decoded  vector.  As  baselines  for  comparison, 
we  employed  i)  the  Schnor-Euchner  variant  of  SD  (SE-SD)  with  an  infinite  radius  so  that  the  optimal 
solution  is  always  obtained;  and  ii)  two  commonly  used  suboptimal  solutions  of  complexity  0(M3): 
the  quantized  output  of  the  zero-forcing  linear  receiver  (QZF),  and  the  (nonlinear)  block  MMSE-DFE 
(BMMSE-DFE)  [3],  [9].  Two  performance  metrics  were  used:  Symbol  Error  Rate  (SER),  and  worst-case 
execution2  time.  SE-SD  was  implemented  as  a  Matlab  executable  (mex)  compiled  from  optimized  C 

2On  an  Intel  Centrino  1.6GHz  system,  with  512M  RAM. 
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code;  SDR  was  implemented  using  the  general-purpose  SeDuMi  toolbox  [10].  As  a  result,  execution 
time  estimates  are  somewhat  biased  in  favor  of  SE-SD.  The  reason  for  using  a  measure  of  worst-case  (as 
opposed  to  average)  complexity  is  that  in  on-line  applications  we  have  to  decode  within  a  specified  time, 
and  bad  channels  do  happen  with  positive  probability.  The  choice  between  execution  time  or  number  of 
floating  point  operations  is  debatable,  especially  because  SE-SD  was  implemented  in  mex/C;  but  we  are 
interested  in  order-of-magnitude  estimates,  and  differences  in  execution  time  are  easier  to  appreciate. 

Figures  1  and  2  show  the  SER  versus  SNR  and  worst-case  execution  time  versus  SNR,  respectively, 
for  the  16  x  16  system  using  64-QAM  (6416  «  8  x  1028).  From  figure  2,  it  is  evident  that  SE-SD  is  too 
complex  for  this  configuration;  very  long  runs  are  actually  not  atypical.  Due  to  this,  figure  2  actually 
shows  a  lower  bound  on  the  worst-case  execution  time  of  SE-SD,  computed  from  far  fewer  realizations. 
The  associated  SER  cannot  be  estimated  in  reasonable  time,  and  is  therefore  not  reported  in  figure  1. 
SDR  provides  a  performance  improvement  of  up  to  7.5  dB  over  BMMSE-DFE.  Note  that  the  worst-case 
complexity  of  SDR  is  essentially  independent  of  SNR.  In  fact  the  point-wise  complexity  of  SDR  is  very 
stable  and  predictable  for  any  problem  realization.  This  is  good  at  low  to  moderate  SNR,  but  a  drawback 
at  high  SNR  where  the  detection  problem  becomes  easier.  Also  note  that  the  number  of  randomization 
samples  used  in  SDR  does  not  affect  the  grosso  modo  complexity  order,  as  expected;  and  a  moderate 
number  of  randomizations  is  sufficient. 

Figures  3  and  4  show  corresponding  results  for  the  8x8  system  using  16-QAM  (16s  ~  4.3  x  109). 
Notice  that,  in  this  (far)  simpler  scenario,  SE-SD  is  much  more  efficient  computationally  than  SDR,  and 
it  always  yields  the  exact  ML  solution.  SDR  is  up  to  7.5  dB  away  from  SE-SD,  at  a  uniformly  higher 
computational  cost  across  the  range  of  SNR  of  interest.  It  clearly  makes  no  sense  to  use  SDR  in  this 
case. 

Summarizing,  the  SD  family  of  detectors  exhibits  a  threshold  behavior:  it  either  works  very  well  (for 
low-enough  symbol  vector  dimension,  order  of  the  individual  symbol  constellation,  and  high-enough 
SNR)  or  it  “freezes”.  The  threshold  between  the  two  regimes  depends  on  a  combination  of  these  three 
factors.  When  SD  works,  it  outperforms  SDR  in  terms  of  complexity  and  SER  performance.  In  difficult 
scenarios,  SDR  offers  an  attractive  alternative  relative  to  earlier  solutions. 

V.  Conclusions 

We  have  proposed  a  new  SDR  approach  for  MIMO  detection  of  high-order  QAM  constellations.  The 
new  approach  is  the  simplest  one  in  the  class  of  SDR  detectors  for  high-order  QAM:  its  worst-case 
complexity  is  nearly  cubic  in  the  dimension  of  the  transmitted  symbol  vector,  and  independent  of  the 
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constellation  order  for  uniform  QAM  /  affine  in  the  constellation  order  for  non-uniform  QAM.  Under 

certain  conditions,  the  new  approach  affords  significant  improvements  in  SER  over  prior  methods. 
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Fig.  1.  SER  versus  SNR:  16  x  16  system,  64-QAM  symbols. 


Fig.  2.  Worst-case  execution  time  versus  SNR:  16  x  16  system,  64-QAM  symbols. 
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symbols,  16-QAM,  8  x  8  iid  CN(0,1)  channel,  MC=1000,  new  channel  per  MC 


15  20 

SNR,  in  dB 


Fig.  3.  SER  versus  SNR:  8x8  system,  16-QAM  symbols. 


Fig.  4.  Worst-case  execution  time  versus  SNR:  8x8  system,  16-QAM  symbols. 
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APPROXIMATION  BOUNDS  FOR  QUADRATIC  OPTIMIZATION  WITH 
HOMOGENEOUS  QUADRATIC  CONSTRAINTS 

ZHI-QUAN  LUO*,  NICHOLAS  D.  SIDIROPOULOS* * ,  PAUL  TSENG*,  AND  SHUZHONG  ZHANG§ 


Abstract.  We  consider  the  NP-hard  problem  of  finding  a  minimum  norm  vector  in  n-dimensional  real  or 
complex  Euclidean  space,  subject  to  m  concave  homogeneous  quadratic  constraints.  We  show  that  a  semidefinite 
programming  (SDP)  relaxation  for  this  nonconvex  quadratically  constrained  quadratic  program  (QP)  provides  an 
0(m2)  approximation  in  the  real  case,  and  an  0(m)  approximation  in  the  complex  case.  Moreover,  we  show  that 
these  bounds  are  tight  up  to  a  constant  factor.  When  the  Hessian  of  each  constraint  function  is  of  rank  1  (namely, 
outer  products  of  some  given  so-called  steering  vectors)  and  the  phase  spread  of  the  entries  of  these  steering  vectors 
are  bounded  away  from  7t/2,  we  establish  a  certain  “constant  factor”  approximation  (depending  on  the  phase 
spread  but  independent  of  m  and  n)  for  both  the  SDP  relaxation  and  a  convex  QP  restriction  of  the  original 
NP-hard  problem.  Finally,  we  consider  a  related  problem  of  finding  a  maximum  norm  vector  subject  to  m  convex 
homogeneous  quadratic  constraints.  We  show  that  a  SDP  relaxation  for  this  nonconvex  QP  provides  an  0(1/  ln(m)) 
approximation,  which  is  analogous  to  a  result  of  Nemirovski,  Roos  and  Terlaky  [14]  for  the  real  case. 


Key  words,  semidefinite  programming  relaxation,  nonconvex  quadratic  optimization,  approximation  bound 


AMS  subject  classifications.  90C22,  90C20,  90C59 


1.  Introduction.  Consider  the  quadratic  optimization  problem  with  concave  homogeneous 
quadratic  constraints: 


Y  \h? zi2  >  t  i  =  l,-,m, 

z  e  Fn, 

where  IF  is  either  IR  or(C,  ||  •  ||  denotes  the  Euclidean  norm  in  IFn,  m  >  1,  each  he  is  a  given  vector  in 
IFn,  and  1\, . . . , Tm  are  nonempty,  mutually  disjoint  index  sets  satisfying  2)  IJ  •  •  •  U T,m  =  {1, M}. 
Throughout,  the  superscript  “H”  will  denote  the  complex  Hermitian  transpose,  i.e.,  for  z  =  x  +  iy, 
where  x,  y  £  IRn  and  i2  =  —  1,  zH  =  xT  —  i yT .  Geometrically,  the  above  problem  (1.1)  corresponds 

*  Department  of  Electrical  and  Computer  Engineering,  University  of  Minnesota,  200  Union  Street  SE,  Min¬ 
neapolis,  MN  55455  (luozq@ece.umn.edu).  The  work  of  this  author  is  supported  in  part  by  the  National  Science 
Foundation,  Grant  No.  DMS-0312416. 

'  Department  of  Electronic  and  Computer  Engineering,  Technical  University  of  Crete,  73100  Chania  -  Crete, 
Greece. (nikos@telecom. tuc .gr).  The  work  of  this  author  is  supported  in  part  by  the  U.S.  ARO  under  ERO, 
Contract  No.  N62558-03-C-0012,  and  the  EU  under  U-BROAD  STREP,  Grant  No.  506790. 

*  Department  of  Mathematics,  University  of  Washington,  Seattle,  Washington  98195 
(tseng@math.washington.edu).  The  work  of  this  author  is  supported  by  the  National  Science  Foundation, 
Grant  No.  DMS-0511283. 

§  Department  of  Systems  Engineering  and  Engineering  Management,  The  Chinese  University  of  Hong  Kong, 
Shatin,  Hong  Kong.  (zhang@se .  cuhk .  edu .  hk).  The  work  of  this  author  is  supported  by  Hong  Kong  RGC  Earmarked 
Grant  CUHK418505. 


(i.i) 


v  :=  mm 

qp 

S.t. 


1 


2 


Z.-Q.  LUO,  N.D.  SIDIROPOULOS,  P.  TSENG  AND  S.  ZHANG 


to  finding  a  least  norm  vector  in  a  region  defined  by  the  intersection  of  the  exteriors  of  to  co¬ 
centered  ellipsoids.  If  the  vectors  are  linearly  independent,  then  M  equals  the  sum  of 

the  rank  of  the  matrices  defining  these  m  ellipsoids.  Notice  that  the  problem  (1.1)  is  easily  solved 
for  the  case  of  n  =  1,  so  we  assume  n  >  2. 

We  assume  that  Xu’gx  I  he  |  yf  0  for  all  i,  which  is  clearly  a  necessary  condition  for  (1.1)  to 
be  feasible.  This  is  also  a  sufficient  condition  (since  (J™i {z  |  Yltei  \hf  z\2  =  0}  is  a  finite  union 
of  proper  subspaces  of  IF",  so  its  complement  is  nonempty  and  any  point  in  its  complement  can 
be  scaled  to  be  feasible  for  (1.1)).  Thus,  the  above  problem  (1.1)  always  has  an  optimal  solution 
(not  necessarily  unique)  since  its  objective  function  is  coercive,  continuous,  and  its  feasible  set 
is  nonempty,  closed.  Notice,  however,  that  the  feasible  set  of  (1.1)  is  typically  nonconvex  and 
disconnected,  with  an  exponential  number  of  connected  components  exhibiting  little  symmetry. 
This  is  in  contrast  to  the  quadratic  problems  with  convex  feasible  set  but  nonconvex  objective 
function  considered  in  [13,  14,  22].  Furthermore,  unlike  the  class  of  quadratic  problems  studied  in 
[1,  7,  8,  15,  16,  21,  23,  24,  25,  26],  the  constraint  functions  in  (1.1)  do  not  depend  on  z2,  ...,22  only. 

Our  interest  in  the  nonconvex  QP  (1.1)  is  motivated  by  the  transmit  beamforming  problem 
for  multicasting  applications  [20]  and  by  the  wireless  sensor  network  localization  problem  [6].  In 
the  transmit  beamforming  problem,  a  transmitter  utilizes  an  array  of  n  transmitting  antennas  to 
broadcast  information  within  its  service  area  to  to  radio  receivers,  with  receiver  i  G  {l,...,m} 
equipped  with  \lj\  receiving  antennas.  Let  hi,  t  £  2j,  denote  the  n  x  1  complex  steering  vector 
modelling  propagation  loss  and  phase  shift  from  the  transmitting  antennas  to  the  £th  receiving 
antenna  of  receiver  i.  Assuming  that  each  receiver  performs  spatially  matched  filtering  /  maximum 
ratio  combining,  which  is  the  optimal  combining  strategy  under  standard  mild  assumptions,  then 
the  constraint 


E  \h?- z\2  ^ 1 

models  the  requirement  that  the  total  received  signal  power  at  receiver  i  must  be  above  a  given 
threshold  (normalized  to  1).  This  constraint  is  also  equivalent  to  a  signal-to-noise  ratio  (SNR) 
condition  commonly  used  in  data  communication.  Thus,  to  minimize  the  total  transmit  power 
subject  to  individual  SNR  requirements  (one  at  each  receiver),  we  are  led  to  the  QP  (1.1).  In  the 
special  case  where  each  radio  receiver  is  equipped  with  a  single  receiving  antenna,  the  problem 
reduces  to  [20]: 


min  ||2||2 

(1.2)  s.t.  \hf  z\2  >  1,  £  =  1, ...,  to, 

2  G  Fn, 

This  problem  is  a  special  case  of  (1.1)  whereby  each  ellipsoid  lies  in  IFn  and  the  corresponding 
matrix  has  rank  1. 

In  this  paper,  we  first  show  that  the  nonconvex  QP  (1.2)  is  NP-hard  in  either  the  real  or 
the  complex  case,  which  further  implies  the  NP-hardness  of  the  general  problem  (1.1).  Then,  we 
consider  a  semidefinite  programming  (SDP)  relaxation  of  (1.1)  and  a  convex  QP  restriction  of  (1.2) 


APPROXIMATION  BOUNDS  FOR  QUADRATIC  OPTIMIZATION 


3 


and  study  their  worst-case  performance.  In  particular,  let  usdP,  ucqp  and  vqp  denote  the  optimal 
values  of  the  SDP  relaxation,  the  convex  QP  restriction,  and  the  original  QP  (1.1),  respectively. 
We  establish  a  performance  ratio  of  uqp/usdp  =  0(m 2)  for  the  SDP  relaxation  in  the  real  case, 
and  we  give  an  example  showing  that  this  bound  is  tight  up  to  a  constant  factor.  Similarly,  we 
establish  a  performance  ratio  of  Uqp/usdp  =  0(m)  in  the  complex  case,  and  we  give  an  example 
showing  the  tightness  of  this  bound.  We  further  show  that,  in  the  case  when  the  phase  spread  of 
the  entries  of  h\, ...,  h.M  is  bounded  away  from  ir/2,  the  performance  ratios  i'qp/usdp  and  uCqp/uqp 
for  the  SDP  relaxation  and  the  convex  QP  restriction,  respectively,  are  independent  of  m  and  n. 

In  recent  years,  there  have  been  extensive  studies  of  the  performance  of  SDP  relaxations  for 
nonconvex  QP.  However,  to  our  knowledge,  this  is  the  first  performance  analysis  of  SDP  relaxation 
for  QP  with  concave  quadratic  constraints.  Our  proof  techniques  also  extend  to  a  maximization 
version  of  the  QP  (1.1)  with  convex  homogeneous  quadratic  constraints.  In  particular,  we  give  a 
simple  proof  of  a  result  analogous  to  one  of  Nemirovski,  Roos  and  Terlaky  [14]  (also  see  [13,  The¬ 
orem  4.7])  for  the  real  case,  namely,  the  SDP  relaxation  for  this  nonconvex  QP  has  a  performance 
ratio  of  0(l/ln(m)). 

2.  NP-hardness.  In  this  section,  we  show  that  the  nonconvex  QP  (1.1)  is  NP-hard  in  general. 
First,  we  notice  that,  by  a  linear  transformation  if  necessary,  the  following  problem 

minimize  zHQz 

(2.1)  subject  to  \zi\  >1,  t=  l,...,n, 

z  G  Fn, 

is  a  special  case  of  (1.1),  where  Q  G  IFnxn  is  a  Hermitian  positive  definite  matrix  (i.e. ,  Q  >-  0), 
and  Z(  denotes  the  £th  component  of  2.  Hence,  it  suffices  to  establish  the  NP-hardness  of  (2.1).  To 
this  end,  we  consider  a  reduction  from  the  NP-complete  partition  problem:  Given  positive  integers 
ai,  <2,2 ,  ...,  a_/v,  decide  whether  there  exists  a  subset  X  of  {1, ...,  N}  satisfying 

1  N 

(2.2) 

tzi  £=i 

Our  reductions  differ  for  the  real  and  complex  cases.  As  will  be  seen,  the  NP-hardness  proof  in 
the  complex  case1  is  more  intricate  than  in  the  real  case. 

2.1.  The  Real  Case.  We  consider  the  real  case  of  IF  =  IR.  Let  n  :=  N  and 

a  :=  (oi, . . .  ,ajv)T, 

Q  :=  aoF  +  In  0, 

where  In  denotes  the  n  x  n  identity  matrix. 

1This  NP-hardness  proof  was  first  presented  in  an  appendix  of  [20]  and  is  included  here  for  completeness;  also 
see  [26,  Proposition  3.5]  for  a  related  proof. 


4 


Z.-Q.  LUO,  N.D.  SIDIROPOULOS,  P.  TSENG  AND  S.  ZHANG 


We  show  that  a  subset  X  satisfying  (2.2)  exists  if  and  only  if  the  optimization  problem  (2.1) 
has  a  minimum  value  of  n.  Since 

n 

ztQz  =  \aTz\2  +  ^  \ze\2  >  n  whenever  \zi\  >  1  V  £,  z  €  IRn, 
t=i 

we  see  that  (2.1)  has  a  minimum  value  of  n  if  and  only  if  there  exists  a  z  £  lRn  satisfying 

aTz  =  0,  \zg\  =  1  VI 

The  above  condition  is  equivalent  to  the  existence  of  a  subset  X  satisfying  (2.2),  with  the  corre¬ 
spondence  X  =  {£  |  Z(.  =  1}.  This  completes  the  proof. 


2.2.  The  Complex  Case.  We  consider  the  complex  case  of  IF  =C.  Let  n  :=  2 TV  +  1  and 

a:=  (a1,...,aN)T, 

A j  In  In  —£n 

1 1  . I  rri  m  -|  rj i 

a  0N  eN 


Q  : —  A  Ax  In  X  0, 


where  ejy  denotes  the  TV-dimensional  vector  of  ones,  On  denotes  the  TV-dimensional  vector  of  zeros, 
and  In  and  In  are  identity  matrices  of  sizes  n  x  n  and  TV  x  TV,  respectively. 

We  show  that  a  subset  X  satisfying  (2.2)  exists  if  and  only  if  the  optimization  problem  (2.1) 
has  a  minimum  value  of  n.  Since 

n 

zhQz  =  ||Az||2  +  ^  |z^|2  >  n  whenever  \zi\  >  1  V  I,  z  G(Cn, 
e=i 

we  see  that  (2.1)  has  a  minimum  value  of  n  if  and  only  if  there  exists  a  z  £  (Dn  satisfying 

Az  =  0,  \zc\  =  1  Vt 
Expanding  Az  =  0  gives  the  following  set  of  linear  equations: 

(2.3)  0  =  ze  +  zN+e  -  zn,  £=1,...,N, 

n  /  JV 

(2.4)  0  =  aeZe  ~  o  X] ae 

l=i  \^=i 

For  t  =  1, ...,  2TV,  since  \zi\  =  \zn\  —  1  so  that  z^/zra  =  el8e  for  some  9 1  G  [0,  27t),  we  can  rewrite 
(2.3)  as 

cos  9t  +  cos  dN+i  =  1,  N 

sin  9  e  +  sin  9  n+c  =  0, 

These  equations  imply  that  9 1  G  {— 7r/3,7r/3}  for  all  £  ^  n.  In  fact,  these  equations  further  imply 
that  cos  9i  =  cos  9n+(  =  1/2  for  £  =  1, ...,  TV,  so  that 

‘(S-H  (£-))-“ 
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Therefore,  (2.4)  is  satisfied  if  and  only  if 

which  is  further  equivalent  to  the  existence  of  a  subset  X  satisfying  (2.2),  with  the  correspondence 
X  =  {i  \  Oe  =  n/3}.  This  completes  the  proof. 

3.  Performance  analysis  of  SDP  relaxation.  In  this  section,  we  study  the  performance 
of  an  SDP  relaxation  of  (1.2).  Let 

Hi  :=  h(.hf ,  i  =  1, ...,  m. 

The  well-known  SDP  relaxation  of  (1.1)  [11,  19]  is 
usdp  :=  min  Tr  (Z) 

(3.1)  s.t.  Tr(HiZ)  >  1,  i  =  l,...,m, 

Z  y  0,  Z  £  ]Fnxn  is  Hermitian. 

An  optimal  solution  of  the  SDP  relaxation  (3.1)  can  be  computed  efficiently  using,  say,  interior- 
point  methods;  see  [18]  and  references  therein. 

Clearly  usdp  <  v  .  We  are  interested  in  upper  bounds  for  the  relaxation  performance  of  the 
form 

v  <  Cv  ,  , 

where  C  >  1.  Since  we  assume  Hi  ^  0  for  all  i,  it  is  easily  checked  that  (3.1)  has  an  optimal 
solution,  which  we  denote  by  Z* . 

3.1.  General  steering  vectors:  the  real  case.  We  consider  the  real  case  of  IF  =  IR. 
Upon  obtaining  an  optimal  solution  Z*  of  (3.1),  we  construct  a  feasible  solution  of  (1.1)  using  the 
following  randomization  procedure: 


We  will  use  z*(f)  to  analyze  the  performance  of  the  SDP  relaxation.  Similar  procedures  have  been 
used  for  related  problems  [1,  3,  4,  5,  14].  First,  we  need  to  develop  two  lemmas.  The  first  lemma 
estimates  the  left-tail  of  the  distribution  of  a  convex  quadratic  form  of  a  Gaussian  random  vector. 

Lemma  3.1.  Let  H  £  IRnxn,  Z  e  IRnxn  be  two  symmetric  positive  semidefinite  matrices 
(i.e.,  H  y  0,  Z  y  0).  Suppose  £  £  lRn  is  a  random  vector  generated  from  the  real-valued  normal 
distribution  N(0 ,  Z).  Then,  for  any  7  >  0, 

(3.2)  Prob  {fTHf  <  !E{em)  <  max  , 
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where  r  :=  min{rank(H), rank(Z)}. 

Proof.  Since  the  covariance  matrix  Z  y  0  has  rank  r  :=  rank(Z),  we  can  write  Z  =  UUT ,  for 
some  U  £  lRnxr  satisfying  UT ZU  =  Ir.  Let  £  :=  QTUTf  £  IR1',  where  Q  £  IRrxr  is  an  orthogonal 
matrix  corresponding  to  the  eigen-decomposition  of  the  matrix 

UtHU  =  QAQt, 

for  some  diagonal  matrix  A  =  diag{Ai,  A2, Ar },  with  Ai  >  A2  >  ...  >  Ar  >  0.  Since  UT HU  has 
rank  at  most  f,  we  have  Xt  =  0  for  all  i  >  f.  It  is  readily  checked  that  f  has  the  normal  distribution 
N(0,Ir).  Moreover,  £  is  statistically  identical  to  UQ£,  so  that  fTH£  is  statistically  identical  to 

irQTUTHUQf  =  £tA£  =  Ail^il2- 

i= 1 

Then,  we  have 

(r  /  r 

J2xS\2<AE  f^A,:|e4|2 

(f  r  \ 

A.i|^;|2  <  7^  . 

If  Ai  =  0,  then  this  probability  is  zero,  which  proves  (3.2).  Thus,  we  will  assume  that  Ai  >  0.  Let 
A i  :=  Aj/(Ai  +  •  •  •  +  Xf),  for  i  =  1,  ...,f.  Clearly,  we  have 

A |  -)-***  T  Xf.  =  1,  Ax  A  A2  A  ...  A  Xf  A  0. 


We  consider  two  cases.  First,  suppose  Ai  >  a,  where  0  <  a  <  1.  Then,  we  can  bound  the 
above  probability  as  follows: 

Prob  (fTHf  <  7£(£TfJ£))  =  Prob  (  ^  A4|£|2  <  7 

Vi=i 

<  Prob  (Ax|£x|2  <  7) 

(3.3)  <  Prob  (|^x  |2  <  7/a) 

< 

where  the  last  step  is  due  to  the  fact  that  £1  is  a  real-valued  zero  mean  Gaussian  random  variable 
with  unit  variance. 

In  the  second  case,  we  have  Ai  <  a,  so  that 

A2  -(-***  T  Af  =  1  —  Ax  >1  —  ex. 

This  further  implies  (f  —  1)A2  >  A2  H - +  Af  >  1  —  a.  Hence 

—  —  1  —  a 

Ax  >  A2  >  — — -. 

r  —  1 
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Using  this  bound,  we  obtain  the  following  probability  estimate: 

Prob  (fTHf  <  7 E{£tHQ)  =  Prob  [ £  A^2  <  7 

\i=l 

<  Prob  (A!^!2  <  7,  A2|$2|2  <  7) 

(3.4)  =  Prob  (Arial2  <  7)  •  Prob  (A2|6|2  <  7) 

7tA2 

<  2(r  -  1)7 
—  7r(l  —  a) 

Combining  the  estimates  for  the  above  two  cases  and  setting  a  =  2/ 7r,  we  immediately  obtain  the 
desired  bound  (3.2).  ■ 


Lemma  3.2.  Let  IF  =  1R.  Let  Z*  >7  0  be  a  feasible  solution  of  (3.1)  and  let  z*(£)  be  generated 
by  the  randomization  procedure  described  earlier.  Then,  with  probability  1,  z*(£)  is  well  defined 
and  feasible  for  (1.1).  Moreover,  for  every  7  >  0  and  p  >  0, 

(3.5)  Prob  (  miir  fTH^>  7,  ||£||2  <  pTi(Z*)\  >  1  -  m  ■  max  ( - -, 

\1  <l<na  J  ^  7 T  ~  Z  )  p 

where  r  :=  rank(.Z*). 

Proof.  Since  Z*  >;  0  is  feasible  for  (3.1),  it  follows  that  Tr (HiZ*)  >  1  for  all  i  =  1,  ...,m.  Since 
E(fTHif)  =  Tr (HiZ*)  >  1  and  the  density  of  fTHif  is  absolutely  continuous,  the  probability  of 
fT Hit;  =  0  is  zero,  implying  that  z*(tf)  is  well  defined  with  probability  1.  The  feasibility  of  z*(£) 
is  easily  verified. 

To  prove  (3.5),  we  first  note  that  E(^fT)  =  %* ■  Thus,  for  any  7  >  0  and  p  >  0, 

Prob  (  min  f,TH^>  7,  ||£||2  <  pTi(Z*)\ 

=  Prob  Hif  >  7  V  i  =  1, ...,  m  and  ||£||2  <  pTr(Z*)) 

>  Prob  >  -fTY(HtZ*)  V  i  =  1,  ...,m  and  ||£||2  <  pTv(Z*)) 

=  Prob  (t;T Hit;  >  'yE(fT H.^)  V  i  =  1, ...,  m  and  ||£||2  <  pE(\\f\\2)) 

=  1  -  Prob  (t;T Hit;  <  7 E(£T Hif)  for  some  i  or  ||£||2  >  pE{ ||£||2)) 

m 

>  1  -  £  Prob  (fTHti  <  7 EifHiO)  -  Prob  (||£||2  >  ^(l|£l|2)) 

i= 1 

/  _  2(r  -  1)7 1  1 

>  1  -  m  ■  max  X 7, - —  f - , 

l  tt-2  J  h 

where  the  last  step  uses  Lemma  3.1  as  well  as  Markov’s  inequality: 

Prob  (Ha2  >p£(||£||2))<-. 
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This  completes  the  proof.  ■ 


We  now  use  Lemma  3.2  to  bound  the  performance  of  the  SDP  relaxation. 


Theorem  3.3.  Let  IF  =  IR.  For  the  QP  (1.1)  and  its  SDP  relaxation  (3.1),  we  have  t>qp  =  t;sdP 
if  m  <  2,  and  otherwise 


27m2 

^qp  <  - Usdp- 

7T 


Proof.  By  applying  a  suitable  rank  reduction  procedure  if  necessary,  we  can  assume  that  the 
rank  r  of  the  optimal  SDP  solution  Z*  satisfies  r(r  +  l)/2  <  m;  see  e.g.  [17].  Thus  r  <  \J 2m.  If 
m  <  2,  then  r  =  1,  implying  that  Z*  =  z*(z*)T  for  some  z*  £  IRn  and  it  is  readily  seen  that  z*  is 
an  optimal  solution  of  (1.1),  so  that  r>qp  =  usdp.  Otherwise,  we  apply  the  randomization  procedure 
to  Z*.  We  also  choose 


M  =  3, 


7 r 

9m2 


Then,  it  is  easily  verified  using  r  <  \j2m  that 


V7!  > 


2(r  —  1)7 

7T  —  2 


V  m  =  1, 2, ... 


Plugging  these  choices  of  7  and  //  into  (3.5),  we  see  that  there  is  a  positive  probability  (independent 
of  problem  size)  of  at  least 

1  -  myj 7  -  -  =  1  -  ^  i  =  0.0758... 

H  3  3 

that  £  generated  by  the  randomization  procedure  satisfies 

-  9m2  and  -  3Tr(Z*)' 


Let  f  be  any  vector  satisfying  these  two  conditions.2  Then,  z*(£)  is  feasible  for  (1.1),  so  that 

/|,*m,|2_  lien2  ^3Tr(Z*)_27m2 
qp  -  11  (C)I1  _  mimf  H£  -  (7T/9m2)  tt  sdp’ 

where  the  last  equality  uses  Tr (Z*)  =  vsdP-  ■ 


In  the  above  proof,  other  choices  of  /i  can  also  be  used,  but  the  resulting  bound  seems  not  as 
sharp.  Theorem  3.3  suggests  that  the  worst-case  performance  of  the  SDP  relaxation  deteriorates 

2The  probability  that  no  such  £  is  generated  after  N  independent  trials  is  at  most  (1  —  0.0758..)^,  which  for 
N  =  100  equals  0.000375..  Thus,  such  £  requires  relatively  few  trials  to  generate. 
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quadratically  with  the  number  of  quadratic  constraints.  Below  we  give  an  example  demonstrating 
that  this  bound  is  in  fact  tight  up  to  a  constant  factor. 

Example  1:  For  any  m  >  2  and  n  >  2,  consider  a  special  instance  of  (1.2),  corresponding  to  (1.1) 
with  \T;,\  =  1  (i.e. ,  each  Hi  has  rank  1),  whereby 


7  ,  /-M  . 

he  =  cos  —  ,  sin 

m  ) 


>  0,  •  •  • ,  0 


=  1,  — ,  m. 


Let  z*  =  (z*. . . . ,  z*n)T  €  1R11  be  an  optimal  solution  of  (1.2)  corresponding  to  the  above  choice  of 
steering  vectors  he-  We  can  write 

(z*,z|)  =  ^(cos^jSinfi1),  for  some  9  €  [0,27t). 

Since  {£w/m,  t  =  1,  ...,m}  is  uniformly  spaced  on  [0,7r),  there  must  exist  an  integer  l  such  that 
either 


tit  7 r 
m  2 


7 r 

<  x —  or 
2  TO 


„  fir  ir 

9  b  - 
in  2 


< 


2  m 


For  simplicity,  we  assume  the  first  case.  (The  second  case  can  be  treated  similarly.)  Since  the  last 
(n  —  2)  entries  of  he  are  zero,  it  is  readily  checked  that 


I  he  z*  |  =  p 


ii r 


cos  U  — 


Since  z*  satisfies  the  constraint  \hjz*\  >  1,  it  follows  that 


m 
t 


=  P 


£tt 

<  P 

/  T  \ 

-  — 

— 

sin  - — 

m 

2  / 

\2  mJ 

< 


pir 

2m 


|V||> 

7T  7T 


implying 


V„n=IU*l|2> 


4m2 


On  the  other  hand,  the  positive  semidefinite  matrix 

Z*  =  diag{l,  1, 0, —  ,  0} 

is  feasible  for  the  SDP  relaxation  (3.1),  and  it  has  an  objective  value  of  Tr (Z*)  =  2.  Thus,  for  this 
instance,  we  have 

2  m2 


Pqp  —  ^sdp- 


The  preceding  example  and  Theorem  3.3  show  that  the  SDP  relaxation  (3.1)  can  be  weak  if 
the  number  of  quadratic  constraints  is  large,  especially  when  the  steering  vectors  he  are  in  a  certain 
sense  “uniformly  distributed”  in  space. 
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3.2.  General  steering  vectors:  the  complex  case.  We  consider  the  complex  case  of 
IF  =  C.  We  will  show  that  the  performance  ratio  of  the  SDP  relaxation  (3.1)  improves  to  0(m) 
in  the  complex  case  (as  opposed  to  0(m2)  in  the  real  case).  Similar  to  the  real  case,  upon 
obtaining  an  optimal  solution  Z*  of  (3.1),  we  construct  a  feasible  solution  of  (1.1)  using  the 
following  randomization  procedure: 

1.  Generate  a  random  vector  £  £  (Dn  from  the  complex-valued  normal  distri¬ 
bution  Nc(0,Z*)  [2,  26], 

2.  Let  z*(£)  =  £/  min  \J £ HH 

1  <£<m 

Most  of  the  ensuing  performance  analysis  is  similar  to  that  of  the  real  case.  In  particular,  we 
will  also  need  the  following  two  lemmas  analogous  to  Lemmas  3.1  and  3.2. 

Lemma  3.4.  Let  H  e(Cnxn,  Z  e(Dnxn  be  two  Hermitian  positive  semidefinite  matrices  (i.e., 
H  y  0,  Z  y  0).  Suppose  f  G  Cn  is  a  random  vector  generated  from  the  complex-valued  normal 
distribution  Nc( 0,  Z).  Then,  for  any  7  >  0, 

7, 16(r-  1)272|  , 

where  f  :=  min{rank(Ff), rank(Z)}. 

Proof  We  follow  the  same  notations  and  proof  as  for  Lemma  3.1,  except  for  two  blanket 
changes: 

matrix  transpose  — >  Hermitian  transpose, 
orthogonal  matrix  — >  unitary  matrix. 

Also,  £  has  the  complex- valued  normal  distribution  Nc( 0,  Ir).  With  these  changes,  we  consider  the 
same  two  cases:  \\  >  a  and  Ai  <  a,  where  0  <  a  <  1.  In  the  first  case,  we  have  similar  to  (3.3) 
that 

(3.7)  Prob  <  7 E(£hH£))  <  Prob  (|£l|2  <  j/a)  . 

Recall  that  the  density  function  of  a  complex- valued  circular  normal  random  variable  u  ~  Nc( 0,  cr2), 
where  a  is  the  standard  deviation,  is 

1  M2 

— VuG  G. 

Tra¬ 
in  polar  coordinates,  the  density  function  can  be  written  as 

/(P^)=-^7e_^  V  p  €  [0,  +00),  9  e  [0,  27t). 
na 

In  fact,  a  complex-valued  normal  distribution  can  be  viewed  as  a  joint  distribution  of  its  modulus 
and  its  argument,  with  the  following  particular  properties:  (1)  the  modulus  and  argument  are 
independently  distributed;  (2)  the  argument  is  uniformly  distributed  over  [0, 27r) ;  (3)  the  modulus 
follows  a  Weibull  distribution  with  density 


Prob  (ZhH£  <  7 E(£hH£))  <  max  j  ^ 


if  P  >  0; 
if  p  <  0, 
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and  distribution  function 

(3.8)  Prob{|u|  <t}  =  l-e~£. 

Since  £1  ~  iVc(0, 1),  substituting  this  into  (3.7)  yields 

Prob  (fHHf  <  7 E(fHHf))  <  Prob  (|£i|2  <  7/a)  <  1  -  e“7/a  <  7/a, 
where  the  last  inequality  uses  the  convexity  of  the  exponential  function. 

In  the  second  case  of  Ai  <  a,  we  have  similar  to  (3.4)  that 

Prob  (fHHf  <  7 E^HHf))  <  Prob  (Ai|£i|2  <  7)  •  Prob  (A2|f2|2  <  7) 

=  (l-e-7/Xl)(l-e-7/X2) 

AiA2 

(f-i)V 
-  (l-a)2  ’ 

where  last  step  uses  the  fact  that  Ai  >  A2  >  (1  —  a)/(r  —  1).  Combining  the  estimates  for  the 
above  two  cases  and  setting  a  =  3/4,  we  immediately  obtain  the  desired  bound  (3.6).  ■ 


Lemma  3.5.  Let  IF  =(D.  Let  Z*  y  0  be  a  feasible  solution  of  { 3.1)  and  let  z*{£f)  be  generated 
by  the  randomization  procedure  described  earlier.  Then,  with  probability  1,  z*(£)  is  well  defined 
and  feasible  for  (1.1).  Moreover,  for  every  7  >  0  and  p  >  0, 

Prob  (  min  fH Hjf  >  7,  ||£||2  <  pTr(Z*)  )  >  1  —  m  ■  max  j  L, 16(r  —  1)272 
\l<i<m  J  y  3 

where  r  :=  rank(Z*). 

Proof.  The  proof  is  mostly  the  same  as  that  for  the  real  case  (see  Lemma  3.2).  In  particular, 
for  any  7  >  0  and  p  >  0,  we  still  have 

Prob  (  min  Hif  >  7,  ||£||2  <  pTr(Z*) 

m 

>  1  -  ^  Prob  <  7 E^HHiO)  ~  Prob  (||£||2  >  pE( ||£||2))  . 

i— 1 

Therefore,  we  can  invoke  Lemma  3.4  to  obtain 


Prob  (  min  fH >  7,  ||£||2  <  pTr(Z*)\ 

\1  <i<m  J 

>  1  —  m  •  max  | ^7, 16(r  —  1)272 1  —  Prob  (||^||2  >  /tE(||£||2)) 

>  1  —  m  •  max  j  -7, 16(r  —  1)272 1  —  — , 

13  J  P 
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which  completes  the  proof.  ■ 


Theorem  3.6.  Let  IF  =<C.  For  the  QP  (1.1)  and  its  SDP  relaxation  (3.1),  we  have  i>sdP  =  uqp 
if  m  <  3  and  otherwise 


Vqp  <  8 TO  •  I’sdp- 


Proof.  By  applying  a  suitable  rank  reduction  procedure  if  necessary,  we  can  assume  that  the 
rank  r  of  the  optimal  SDP  solution  Z*  satisfies  r  =  1  if  m  <  3  and  r  <  y/rri  if  m  >  4;  see  [9,  §5]. 
Thus,  if  to  <  3,  then  Z*  =  z*(z*)H  for  some  z*  £  <Dn  and  it  is  readily  seen  that  z*  is  an  optimal 
solution  of  (1.1),  so  that  i>sdP  =  VqP.  Otherwise,  we  apply  the  randomization  procedure  to  Z*.  By 
choosing  /i  =  2  and  7  =  ^ ,  it  is  easily  verified  using  r  <  yfrn  that 

77  >  16(r  —  1)272  V  to  =  1, 2, ... 
o 

Therefore,  it  follows  from  Lemma  3.5  that 

Prob  (  min  Htf  >  7,  ||£||2  <  nTr(Z*)\  >  1  -  m^7  -  -  = 

^  l<t<ra  J  o  fi  0 

Then,  similar  to  the  proof  of  Theorem  3.3,  we  obtain  that  with  probability  of  at  least  1/6,  z*(£) 
is  a  feasible  solution  of  (1.1)  and  r;qp  <  ||2*(£)||2  <  8 to  •  usdp.3  ■ 


The  proof  of  Theorem  3.6  shows  that,  by  repeating  the  randomization  procedure,  the  prob¬ 
ability  of  generating  a  feasible  solution  with  a  performance  ratio  no  more  than  8 to  approaches 
1  exponentially  fast  (independent  of  problem  size).  Alternatively,  a  de-randomization  technique 
from  theoretical  computer  science  can  perhaps  convert  the  above  randomization  procedure  into  a 
polynomial-time  deterministic  algorithm  [12] ;  also  see  [14] . 

Theorem  3.6  shows  that  the  worst-case  performance  of  SDP  relaxation  deteriorates  linearly 
with  the  number  of  quadratic  constraints.  This  contrasts  with  the  quadratic  rate  of  deterioration 
in  the  real  case  (see  Theorem  3.3).  Thus,  the  SDP  relaxation  can  yield  better  performance  in  the 
complex  case.  This  is  in  the  same  spirit  as  the  recent  results  in  [26]  which  showed  that  the  quality 
of  SDP  relaxation  improves  by  a  constant  factor  for  certain  quadratic  maximization  problems 
when  the  space  is  changed  from  lRn  to  Cn.  Below  we  give  an  example  demonstrating  that  this 
approximation  bound  is  tight  up  to  a  constant  factor. 

Example  2:  For  any  m  >  2  and  n  >  2,  let  I\  =  |\/m]  (so  K  >  2).  Consider  a  special  instance  of 
(1.2),  corresponding  to  (1.1)  with  \Zf\  =  1  (i.e.,  each  Hi  has  rank  1),  whereby 

he  =  (^cos  7T,sin  '^e^L  ,0, . . .  ,0^]  with  l  —  jK-K-Vk ,  j,  k  =  1, ...,  K. 


3 The  probability  that  no  such  £  is  generated  after  N  independent  trials  is  at  most  (5/6)^,  which  for  N  =  30 
equals  0.00421..  Thus,  such  £  requires  relatively  few  trials  to  generate. 
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Hence  there  are  K 2  complex  rank-1  constraints.  Let  z*  =  (2*, . . . ,  z*)T  £  Cn  be  an  optimal  solution 
of  (1.2)  corresponding  to  the  above  choice  of  \^/rn\2  steering  vectors  h(.  By  a  phase  rotation  if 
necessary,  we  can  without  loss  of  generality  assume  that  is  real  and  write 

=  ^(cos^jSinfle1^),  for  some  9, ip  £  [0,27t). 


Since  {2fc7r/A',  k  =  1  and  {jn/K,  j  =  1  are  uniformly  spaced  in  [0,27t)  and  [0, 7r) 

respectively,  there  must  exist  integers  j  and  k  such  that 


2kn 

K  . 

<  —  and 

either 

9-3±- 

7T 

7 r 

<  — —  or 

„  77 r 

9  -  —  + 

7 r 

~K 

“  A 

K 

"  2 

“  2A 

A 

2 

< 


2  A 


Without  loss  of  generality,  we  assume 


„  JJT  f 

9~K~  2 


< 


2I\ 


Since  the  last  (n  —  2)  entries  of  each  he  are  zero,  it  is  readily  seen  that,  for  £  =  jK  —  K  +  k, 


Re  {he  z*)  =  p 


JTT 


JTT 


cos  9  cos  —  +  sin  9  sin  —  cos  ip - — 

K  AT  \  AT  J 


=  P 


=  P 


<  P 


cos  u  — 


K 

J7T 

K 


2kir\ 


77T 

+  sin  9  sin  ■—  (  cos  (  ip  — 


K 


-1 


JTT  7 r 


AT 


sin  19  —  bw  —  —  —  2  sin  9  sin  —  sin 


•  JK  .2 


K 


2kTr\ 

K  ) 

Kip  —  2kn\ 

2 K  ) 


sin 


2  K 


2  p  sin2 


2  A" 


< 


PIT  pTT 


2K  2 AT2 ' 


In  addition,  we  have 


Im(ft.f  z*)  \  =  p 

<  P 

<  P 


■  „  ■  JK  .  (  ,  2kn\ 

sin  9  sm  —  sm  l  ip - —  J 

2kn\ 

K  J 

<  81 

-  K' 


sin  yip  — 

.  2kir 
*-~K 


Combining  the  above  two  bounds,  we  obtain 


h  *  I 


\hfZ 


<  \Re(h?z*)\  +  \lm(h?z*)\  <  ^ 


Since  z*  satisfies  the  constraint  \hf  z*\  >  1,  it  follows  that 


2  A2 


tt(3K  +  7 r)  7t(3A'  +  7r)  ’ 
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implying 

v  =  ||2*||2  >  _ — _  =  _ _ _ 

qP  _  7T2(3A"  +  7t)2  7T2(3|"v/77l]  +  7r)2 

On  the  other  hand,  the  positive  semidefinite  matrix 

Z*  =  diag{l,  1, 0, . . . ,  0} 

is  feasible  for  the  SDP  relaxation  (3.1),  and  it  has  an  objective  value  of  Tr (Z*)  =  2.  Thus,  for  this 
instance,  we  have 

^  2 1"  \fm\ 4  ^  2m 

VqP  ~  ^2(3^1  +7T)2  VgdP  ~  7T2(3+7t/2)2  1,SdP' 

The  preceding  example  and  Theorem  3.6  show  that  the  SDP  relaxation  (3.1)  can  be  weak  if 
the  number  of  quadratic  constraints  is  large,  especially  when  the  steering  vectors  hi  are  in  a  certain 
sense  “uniformly  distributed”  in  space.  In  the  next  subsection,  we  will  tighten  the  approximation 
bound  in  Theorem  3.6  by  considering  special  cases  where  the  steering  vectors  are  “not  too  spread 
out  in  space”. 

3.3.  Specially  configured  steering  vectors:  the  complex  case.  We  consider  the  complex 
case  of  IF  =  <D.  Let  Z*  be  any  optimal  solution  of  (3.1).  Since  Z*  is  feasible  for  (3.1),  Z*  ^  0. 
Then 

r 

(3.9)  Z*  =  5>fcuf, 

fc=i 

for  some  nonzero  u>k  G  C" ,  where  r  :=  rank(i?*)  >  1.  By  decomposing  Wk  =  Uk  +  i>k ,  with 
Uk  G  span{/ri, ..., /im}  and  Vk  G  span{/ii, ...,  /im}1,  it  is  easily  checked  that  Z  :=  Y^k=iUkUk 
feasible  for  (3.1)  and 

r  r  r 

(/,  Z*)  =  £  ||ufc  +  ufe||2  =  ^(||ufe||2  +  ||ufc||2)  =  (/,  ~Z)  +  YJ  INI2- 

k- 1  k- 1  k— 1 

This  implies  Vk  =  0  for  all  k,  so  that 

(3.10)  Wk  G  span{/i1; ...,  /iM}. 

Below  we  show  that  the  SDP  relaxation  (3.1)  provides  a  constant  factor  approximation  to  the 
QP  (1.1)  when  the  phase  spread  of  the  entries  of  hi  is  bounded  away  from  n/2. 

Theorem  3.7.  Suppose  that 

p 

(3.11)  he  =  Y,Pi«9i  V  £  =  1, ...,  M, 

i= 1 

for  some  p  >  1,  G  (C  and  gt  G  (Dn  such  that  ||</j||  =  1  and  g^1  gj  =  0  for  all  i  ^  j.  Then  the 
following  results  hold. 
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(a)  If  Re(/?^ fljd)  >  0  whenever  P^/3 jt  ^  0,  then  uqp  <  Cved  ,  where 


(3.12) 


(b)  If  (3U  =  \/3u \el<t>u,  where 


C  :=  max 


1  +  lIm(A?/3^)l2x  1/2 


i,j,e  I  o  V  \Re(Pu  Pp)  I 


7 r 


(3.13)  (\>m  G  —  <f>,  <f>i  +  0]  V  i,  £,  for  some  0  <  <j>  <  —  and  some  <f>£  €  IR, 

then  Re(/3?f  /3je)  >  0  whenever  /3^/3jc  0,  and  C  given  by  (3.12)  satisfies 

1 


(3.14) 

Proof,  (a)  By  (3.10),  we  have 


C< 


cos  (2  <j>) 
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p 

i— 1 


for  some  (iki  G  C.  This  together  with  (3.9)  yields 


(i,z*)  =  j2 


wk\ 


k= 1 
r  p 

=  EE 

k—1  i=  1 


E 

fc=i 


E 


Q-kiQi 


O'ki  | 


EAt 


where  the  third  equality  uses  the  orthonormal  properties  of  gi,...,gp,  and  the  last  equality  uses 

A*:=  (ELi  l^feil2)172  =  IIKi)fe=ill- 

Let 


:=  E  Xi9i‘ 


Then,  the  orthonormal  properties  of  g\,  ■  ■■,  gp  yields 


(3.15) 


U*||2  = 


p 

'y  '  ^iOi 

i—1 


=  EA  l  =  (kz*)  = 


Moreover,  for  each  £  G  {1,  we  obtain  from  (3.9)  that 

r  r 

(hthf,Z*)  =  ^{hthf  ,wkw%)  =  y]  \hfwk\2 


k= 1 


=  E 

fc=i 


Y«kih?gi 


k= i 


=  E 

k=l 


2 
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r  p  p 


p  p 


=  Re  IEEE  akiakj0U  fiji  I  ( EE^E 

,  k—l  1=1  1  =  1  )  \i=l  i=l  k= 1 


akiakj 


P  P 


=  EERe  /&E 

1=  1  1=1  V 


akiakj 


fc=  1 


<EEI^1 

*=i i=i 


^  1  akiakj 


k=  1 


<  EE  1^1  IIKi)UIIIIKi)^l 

*=1 1=1 


p  p 


—  E/  E I  ^  I  ^i  > 

1=1 1=1 


where  the  fourth  equality  uses  (3.11)  and  the  orthonormal  properties  of  cq, the  last  inequality 
is  due  to  the  Cauchy-Schwarz  inequality.  Then,  it  follows  that 

{hthf,Z*)  <  E  E  (lRe(^)|2  +  Ilm(fe)|2)1/2  XiXj 

1=1  l=i 

-VVlRrtd^d  d'Yi  i  lIm(^)l2V/2\\ 

p  p 

^EE  |Re(/3-jf/?#)|  CAjAj 
1=1  l=i 

P  P 

=  EE  Re  {0$  Pje)CXi\j, 

1=1 l=i 

where  the  summation  in  the  second  step  is  taken  over  i,j  with  0^0 jt  ^  0,  the  third  step  is  due  to 
(3.12),  and  the  last  step  is  due  to  the  assumption  that  Re(0]j0j()  >  0  whenever  0^0 jt  7^  0.  Also, 
we  have  from  (3.11)  and  the  orthonormal  properties  of  g±, ..., gp  that 


Kz*\2  = 


E  X>h?9i 


i= 1 


^  (  A 40i£ 


i=l 


=  EE  AiAiRe(/3^ 0jt). 

1=1 1=1 


Comparing  the  above  two  displayed  equations,  we  see  that 

(heh?,Z*)  <  C\h?z*\2,  i=\ 

Since  Z*  is  feasible  for  (3.1),  this  shows  that  VCz*  is  feasible  for  (1.1),  which  further  implies 


Uqp  < 


VCz 


=  C\\z*\\2  =  Cvsdp. 


This  proves  the  desired  result. 
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(b)  The  condition  (3.13)  implies  that  \pn  —  pje\  <2 p  <  7r/2.  In  other  words,  the  phase  angle 
spread  of  the  entries  of  each  Pi  =  (Pu,  P^e,  •  •  • ,  Pne)T  is  no  more  than  2 p.  This  further  implies  that 

(3.16)  cos (pu  -  (, i>ji )  >  cos(2 p)  V  i,j,t 

We  have 


Pufoe  =  IPiAe-'^foPA^ 

=  \Pu\\Pje\(cos((j)je  -  pu)  +  isin(pj(  -  pu)). 


Since  \pu  —  pji |  <  7r/2  so  that  cos (pji  —  pu)  >  0,  we  see  that  Re(PjjPji)  >  0  whenever  PuPji  7^  0. 
Then 


1  + 


\Im(p%pje)\ 


0/2 

J  <(l  +  tan 2(pje  ~  Pm))1  2  = 


< 


\Re(P%Pje)\2  J  3  ‘  cos {pje-pit)  cos(2 p)’ 

where  the  last  step  uses  (3.16).  Using  this  in  (3.12)  completes  the  proof.  ■ 


In  Theorem  3.7(b),  we  can  more  generally  consider  Pa  of  the  form  pu  =  (1  +  i On), 

where  oju  >  0,  an  satisfies  (3.13),  and 


(3.17)  | djc  —  On |  <  <r|l  +  OuOjt  |  V  i,j,£,  for  some  cr  >  0  with  tan(2 p)a  <  1. 

Then  the  proof  of  Theorem  3.7(b)  can  be  extended  to  show  the  following  upper  bound  on  C  given 
by  (3.12): 


yr 


(3-18)  C< - —  _  . 

cos(2(/>)  1  —  tan(2</>)cr 

However,  this  generalization  is  superficial  as  we  can  also  derive  (3.18)  from  (3.14)  by  rewriting 
as 


Pu  =  \Pu\e^u  with  pa  =  pu  +  tan  1(Qu). 

Then,  applying  (3.14)  yields  C  >  cos(2 p),  where  p  =  maxij/\pu  —  Pje\/2 ■  Using  trigono¬ 
metric  identity,  it  can  be  shown  that  cos(2 p)  equals  the  right-hand  side  of  (3.18)  with  cr  = 

max  |  Ojt  -  On  I  / 1 1  +  On  9je\. 

i,j,e  |  euejejt- 1 

Notice  that  Theorem  3.7(b)  implies  that  if  p  =  0,  then  the  SDP  relaxation  (3.1)  is  tight  for 
the  quadratically  constrained  QP  (1.1)  with  IF  =(C.  Such  is  the  case  when  all  components  of  he, 
£  =  1, . ..,  M,  are  real  and  nonnegative. 


4.  A  convex  QP  restriction.  In  this  subsection,  we  consider  a  convex  quadratic  program¬ 
ming  restriction  of  (1.2)  in  the  complex  case  of  IF  =(D  and  analyze  its  approximation  bound.  Let 
us  write  hi  (the  channel  steering  vector)  as 


hi  =  (...,|Mei^>  ••■)£=!,....,«■ 
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For  any  (pj  £  [0,  2ir),  j  =  1, n,  and  any  (p  £  (0, 7r/2) ,  define  the  four  corresponding  index  subsets: 

J e  ■=  {j  I  <Pjt  e  [(pj  -  <p,  pj  +  </>]}, 

Jj?  :=  {j  |  <pjt  £  [pj  —  (p  +  7t/2,  (pj  +  (p  +  7r/2] } , 

Je  ;=  {j  I  e  [h  ~(p  +  T(,<Pj+<P  +  7r]}, 
jj  :=  {j  |  c pji  £  [pj  —  (p  +  37t/2,  (pj  +  (p+  37r/2]}, 

for  i  =  1  The  above  four  subsets  are  pairwise  disjoint  if  and  only  if  (p  <  7r/4,  and  are 

collectively  exhaustive  if  and  only  if  (p  >  7r/4.  Choose  an  index  subset  J  with  the  property  that 

for  each  £,  at  least  one  of  Jj,  Jj,  Jj,  Jj  contains  J. 

Of  course,  J  =  0  is  always  allowable,  but  we  should  choose  J  maximally  since  our  approximation 
bound  will  depend  on  the  ratio  n/\J\  (see  Theorem  4.1  below).  Partition  the  constraint  set  index 
{1, ...,  M}  into  four  subsets  K1 ,  K2,  K3,  I< 4  such  that 

jcjj  ye  £  Kk,  k  =  1,2, 3, 4. 

Consider  the  following  convex  QP  restriction  of  (1.2)  corresponding  to  K1,  K2 ,  K3,  A'4: 

^cqp  ;=  min  INII2 

s.t.  R e(hfz)  >1  V  e  £  K1, 

(4-1)  -lm(hfz)  >1  \/££K2, 

-R  e(hfz)  >1  \/i£K3, 

Im  {hfz)  >1  VfeA'4. 

The  above  problem  is  a  restriction  of  (1.2)  because,  for  any  z  £  <D, 

\z\>  max{|Re(z)|,  |Im(2:)|} 

=  max{Re(^),Im(2),  —  Re(z),  —  Im(^)}. 
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we  can  without  loss  of  generality  assume  that  (j>j  =  0  for  all  j  and  £. 
Let  z*  denote  an  optimal  solution  of  (1.2)  and  write 


with  rj  >  0.  Then,  for  any  i,  we  have  from  hjt |  <  rjj  for  all  j  that 


1  <  \hfz*\  <  r  := 

j= i 

Also,  we  have 


=  ||z*||2  = 

i=i 


Define 


1/2 


Rk  '■  x 


\jeJk 


Sk  :=  X  rir)i- 

jeJk 


Then 


N 


N 


l<r  =  J2sk,  v^  =  J2R2k- 


k= 1 


fc=l 


Without  loss  of  generality,  assume  that  R\/S\  =  R^/Sk-  Then,  using  the  fact  that 

minM<v^Wh 

*  \yk\  -  bill 


for  any  x,  y  G  1RN  with  y  ^  0,4  we  see  from  the  above  relations  that 

R\  Ri 

—  <  — r 

Si  ~  Si 


Since  |  Ji|  <  |  J\,  there  is  an  injective  mapping  ir  from  J\  to  J.  Let  u>  :=  min^^  TJv^/r)j.  Define 
the  vector  z  £( Dn  by 

z-  :={  rv-iyjASWcostf)  if  j  G  tt(Ji); 
l  0  else. 


4 Proof.  Suppose  the  contrary,  so  that  for  some  x,y  £  1RN  with  y  /  0,  we  have  |#fc|/l2/fc|  >  V/^V||^|| 2/ 1|2/||  l  for  all 
k.  Then,  multiplying  both  sides  by  \y^\  and  summing  over  k  yields  ||x||i  >  ViV||x||2,  contradicting  properties  of  1- 
and  2-norms. 
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Then, 


l*ll2  = 


Ri 


< 


Nv 


S2co2  cos2  <f>  to 2  cos2  <f> ' 

Moreover,  for  each  i  £  K1,  since  7r(  J\)  C  J  C  jj,  we  have 
Re  (hfz)  =  Re  I  E  hfeZj 

VieTr(Ji) 

\je7T(Ji) 

1 

Siuj  cos  4> 


jen(J-t) 

>U - 1 - T  E  r*-1U)rl.iC0S(l) 

Si  m  ms  m  yjJ—3 


j£ir(Ji) 


1 


E 


—™U) 


jVj  -= 


SlU)  *■ — '  "  77 7 

jeA 

.  1  -  •  — 7t(j) 

>  — —  >  tutu  •  mm  — — 
■si-  E  J  J  K-h  >l.i 

j&Ji 

=  1, 


where  the  first  inequality  uses  |/i^|  >  r].  and  <j>jt  €  [ — 0,  < fr\  for  j  £  Jj.  Since  Zj  =  0  for  j  ^  jj,  this 
shows  that  z  satisfies  the  first  set  of  constraints  in  (4.1).  A  similar  reasoning  shows  that  z  satisfies 
the  remaining  three  sets  of  constraints  in  (4.1).  ■ 


Notice  that  the  z  constructed  in  the  proof  of  Theorem  4. 1  is  feasible  for  the  further  restriction 
of  (4.1)  whereby  z3  =  0  for  all  j  (jL  J.  This  further  restricted  problem  has  the  same  (worst-case) 
approximation  bound  specified  in  Theorem  4.1. 

Let  us  compare  the  two  approximation  bounds  in  Theorem  3.7  and  Theorem  4.1.  First,  the 
required  assumptions  are  different.  On  the  one  hand,  the  bound  in  Theorem  3.7  does  not  depend 
on  \hj(\,  while  the  bound  in  Theorem  4.1  does.  On  the  other  hand,  Theorem  3.7  requires  that  the 
bounded  angular  spread 

(4.2)  \<j)je  —  <  2<j>  \/j,t, 

for  some  </>  <  7t/4,  while  Theorem  4.1  allows  (f>  <  ir/ 2  and  only  requires  the  condition  (4.2)  for 
all  1  <  i  <  M  and  j  £  J,  where  J  is  a  pre-selected  index  set.  Thus,  the  bounded  angular 
spread  condition  required  in  Theorem  3.7  corresponds  exactly  to  |  J\  =  n.  Thus,  the  assumptions 
required  in  the  two  theorems  do  not  imply  one  another.  Second,  the  two  performance  ratios  are 
also  different.  Naturally,  the  final  performance  ratio  in  Theorem  4.1  depends  on  the  choice  of 
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J  through  the  ratio  |  J\/n,  so  a  large  J  is  preferred.  In  the  event  that  the  assumptions  of  both 
theorems  are  satisfied  and  let  us  assume  for  simplicity  that  fjj  =  r] .  for  all  j,  then  \J\  =  n  and 
4>  <  7t/4,  in  which  case  Theorem  4.1  gives  a  performance  ratio  of  1/ cos2  <j>  while  Theorem  3.7  gives 
l/cos(2 </>).  Since  cos(2 cj>)  =  cos2  <f>  —  sin2  <f)  <  cos2  <j>,  we  have  l/cos(2 cj>)  >  1/ cos2  <j>,  showing  that 
Theorem  4.1  gives  a  tighter  approximation  bound.  However,  this  does  not  mean  Theorem  4.1  is 
stronger  than  Theorem  3.7  since  the  two  theorems  hold  under  different  assumptions  in  general. 

We  can  specialize  Theorem  4.1  to  a  typical  situation  in  transmit  beamforming.  Consider  a 
uniform  linear  transmit  antenna  array  consisting  of  n  elements,  and  let  us  assume  that  the  M 
receivers  are  in  a  sector  area  from  the  far  field,  and  the  propagation  is  line-of-sight.  By  reciprocity, 
each  steering  vector  he  will  be  Vandermonde  with  generator  e_l27risln9<  (see,  e.g.,  [10]),  where  d 
is  the  inter-antenna  spacing,  A  is  the  wavelength,  and  9e  is  the  angle  of  arrival  of  the  ttli  receiving 
antenna.  In  a  sector  of  approximately  60  degrees  about  the  array  broadside,  we  will  have  \6e\  <  7r/3. 
Suppose  that  d/X  =  1/2.  Then  the  steering  vector  corresponding  to  the  £th  receiving  antenna  will 
have  the  form 


In  this  case,  we  have  that  (pje  =  (j  —  1)tt sin  0e  and  \hje\  =  1  for  all  j  and  £.  We  can  take,  e.g., 

4>j  =  0,  (j)  =  j7rmax|  sin0*|,  J  =  {1, ...,  j  +  1}, 

where  j  :=  [1/  max^  |  sin  0i\\ .  Thus,  the  assumptions  of  Theorem  4.1  are  satisfied.  Moreover,  since 
\0f\  <  7r/3  for  all  £,  it  follows  that  |  J\  =  j  +  1  >  2.  If  n  is  not  large,  say,  n  <  8,  then  Theorem  4.1 
gives  a  performance  ratio  of  n/(|  J|  cos2  (j>)  <  16. 

More  generally,  if  we  can  choose  the  partition  Ji, ...,  J/v  and  the  mapping  7 ik  in  Theorem  4.1 
such  that 


(■■■’*&>■■  -heJk  =  Vfc, 

then  the  performance  ratio  in  Theorem  4.1  simplifies  to  N/  cos2  (f>.  In  particular,  this  holds  when 
\hjt\  =  r]  >  0  for  all  j  and  £  or  when  J  =  {1,  ...,n}  (so  that  N  =  1)  and  \hjt\  is  independent  of  t 
for  all  j,  and  more  generally,  when  the  channel  coefficients  periodically  repeat  their  magnitudes. 
In  general,  we  should  choose  the  partition  Ji, ...,  J/v  and  the  mapping  7Tfc  to  make  the  performance 
ratio  in  Theorem  4.1  small.  For  example,  if  J  =  J\  =  {1,2}  and  fj-\  =  100,  fj 2  =  10,  77  =  1, 
rj2  =  10,  then  7Ti(l)  =  2,  7Ti(2)  =  1  is  the  better  choice. 

5.  Homogeneous  QP  in  Maximization  Form.  Let  us  now  consider  the  following  complex 
norm  maximization  problem  with  convex  homogeneous  quadratic  constraints: 

uqp  :=  max  ||2:||2 

(51)  s.t.  ^2\h^z\2<l,  i  =  1, ...,  m, 

Z  eCn, 


where  he  G(Cn. 
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To  motivate  this  problem,  consider  the  problem  of  designing  an  intercept  beamformer5  capable 
of  suppressing  signals  impinging  on  the  receiving  antenna  array  from  irrelevant  or  hostile  emitters, 
e.g.,  jammers,  whose  steering  vectors  (spatial  signatures,  or  “footprints”)  have  been  previously 
estimated,  while  achieving  as  high  gain  as  possible  for  all  other  transmissions.  The  jammer  sup¬ 
pression  capability  is  captured  in  the  constraints  of  (5.1),  and  |Zj|  >  1  covers  the  case  where  a 
jammer  employs  more  than  one  transmit  antennas.  The  maximization  of  the  objective  ||,z||1 2  can 
be  motivated  as  follows.  In  intercept  applications,  the  steering  vector  of  the  emitter  of  interest, 
h,  is  a  priori  unknown,  and  is  naturally  modelled  as  random.  A  pertinent  optimization  objective 
is  then  the  average  beamformer  output  power,  measured  by  .E[|hffz|2].  Under  the  assumption 
that  the  entries  of  h  are  uncorrelated  and  have  equal  average  power,  it  follows  that  E[\hH z\2]  is 
proportional  to  ||z||2,  which  is  often  referred  to  as  the  beamformer’s  white  noise  gain. 

Similar  to  (1.1),  we  let 

m 

Hi  :=  ^2  hthf 
ieii 

and  consider  the  natural  SDP  relaxation  of  (5.1): 

Usdp  :=  m&x  Tr(Z) 

(5.2)  ”  s.t.  Tr(HiZ)  <  1,  i  =  l,...,m, 

Z  y  0,  Z  is  complex  and  Hermitian. 

We  are  interested  in  lower  bounds  for  the  relaxation  performance  of  the  form 

v  >  C  v  ,  , 

where  0  <  C  <  1.  It  is  easily  checked  that  (5.2)  has  an  optimal  solution. 

Let  Z*  be  an  optimal  solution  of  (5.2).  We  will  analyze  the  performance  of  the  SDP  relaxation 
using  the  following  randomization  procedure: 

1.  Generate  a  random  vector  £  £  (Dn  from  the  complex-valued  normal  distri¬ 
bution  Nc(0,  Z*). 

2.  Let  z*(£)  =  £/  max  yj ^HH^. 

1  <£<ra 

First,  we  need  the  following  lemma  analogous  to  Lemmas  3.1  and  3.4. 

Lemma  5.1.  Let  H  £<Cnxn,  Z  £(Cnxn  be  two  Hermitian  positive  semidefinite  matrices  (i.e., 
H  0,  Z  y  Q).  Suppose  £  £  Cn  is  a  random  vector  generated  from  the  complex-valued  normal 
distribution  Nc( 0,  Z).  Then,  for  any  7  >  0, 

(5.3)  Prob  {fHH£  >  7 E(£hH$)  <  r  e“7, 
where  f  :=  min{rank(iir),  rank(Z)}. 

5  Note  that  here  we  are  talking  about  a  receive  beamformer,  as  opposed  to  our  earlier  motivating  discussion  of 
transmit  beamformer  design. 
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Proof.  If  H  =  0,  then  (5.3)  is  trivially  true.  Suppose  H  ^  0.  Then,  as  in  the  proof  of 
Lemma  3.1,  we  have 

Prob  (tHHf  >  7 E(ShH£))  =  Prob  ^  A^|2  >  7^  , 

where  Aq  >  A2  >  . . .  >  AP  >  0  satisfy  Ai  +  •  •  •  +  Af  =  1  and  each  G  C  has  the  complex-valued 
normal  distribution  iVc(0,l).  Then 

Prob  >  7 E(£hH())  <  Prob  (|£i|2  >  7  or  |£2|2  >  7  or  •  •  •  or  |&|2  >  7) 

f 

<  ^Prob(|Ci|2  >7) 

i=  1 

=  f  e— 7, 

where  the  last  step  uses  (3.8).  ■ 


Theorem  5.2.  For  the  complex  QP  (5.1)  and  its  SDP  relaxation  (5.2),  we  have  t>sdP  =  %>  if 
m  <  3  and  otherwise 


1 

^qp  -  41n(100A') Usdp’ 


where  K  :=  YfiL  1  min{rank(J?j),  \[m\. 


Proof.  By  applying  a  suitable  rank  reduction  procedure  if  necessary,  we  can  assume  that  the 
rank  r  of  the  optimal  SDP  solution  Z*  satisfies  r  =  1  if  m  <  3  and  r  <  \Jm  if  m  >  4;  see  [9,  §5]. 
Thus,  if  to  <  3,  then  Z*  =  z*(z*)H  for  some  z*  G  Cn  and  it  is  readily  seen  that  z*  is  an  optimal 
solution  of  (5.1),  so  that  uscjp  =  fqp.  Otherwise,  we  apply  the  randomization  procedure  to  Z* .  By 
using  Lemma  5.1,  we  have,  for  any  7  >  0  and  n  >  0, 


Prob  (  max  H£  <  7,  ||£||2  >  ^Tr (Z*)) 

m 

>  1  -  ^  Prob  >  7 E(ZHHiO)  -  Prob  (||£||2  <  pTr (Z*)) 

i= 1 

(5.4)  >  1  -  AT"7  -  Prob  (||£||2  <  pTr (Z*))  , 


where  the  last  step  uses  r  <  ^Jm. 


Let 


Vj  ■= 


\fj\2/Zjj,  if  Z*3  >  0; 
0,  if  ZT  =  0, 


j  =  1  ,-,n. 


For  simplicity,  let  us  assume  that  Z*.-  >  0  for  all  j  =  1, ...,  n.  Since  ~  _/Vc(0,  Z?-),  as  we  discussed 
in  Subsection  3.2,  fj  follows  a  Weibull  distribution  with  variance  Z*:j  (see  (3.8)),  and  therefore 


*  V  t  G  [0, 00). 


Prob  (rjj  <  t)  =  1  —  e 


24 


Z.-Q.  LUO,  N.D.  SIDIROPOULOS,  P.  TSENG  AND  S.  ZHANG 


Hence, 


Moreover, 


poo  poo 

E(r)j)  =  /  te~ldt  =  1,  E(Wj)  =  /  t^e^dt  =  2,  Var(7^)  =  1. 

Vo  Vo 

/» 1  /»00  2 

—  -E(?jj)|)  =  /  (1  —  +  /  (f  —  l)e_tdf  = -. 

Jo  Ji  e 

Let  us  denote  Aj  =  Z^j/Tr(Z*),  j  =  1,  ...,n,  and  77  :=  ^jVj-  We  have  £(77)  =  1  and 
E(\v  ~  E{r])\)  =  E 


X  -  E(Vj)) 
j= 1 


<  XI  _  jE(7?j)I)  =  -■ 

i=i 


Since,  by  Markov’s  inequality, 


Prob(|?7  — £(77)1  >  a)  <  ^(l??  <  — ,  V  a  >  0, 

a  ae 


we  have 


Prob  (||£||2  <  /7Tr (Z*))  =  Prob  (77  <  77) 

<  Prob  (|?7  —  E{ri)  \  >  1  —  /t) 

2 

<  -7- - r ,  for  all  77  G  (0, 1). 

e(l  -  77) 

Substituting  the  above  inequality  into  (5.4),  we  obtain 

Prob  (  max  £H H£  <  7,  ||£||2  >  fiTr (Z*))  >  1  -  Ke ^  -  2  V  77  G  (0, 1). 

\1  <l<m  )  e(l  —  77) 

Setting  77  =  1/4  and  7  =  ln(100/v)  yields  a  positive  right-hand  side  of  0.00898..,  which  then  proves 
the  desired  bound.  ■ 


The  above  proof  technique  also  applies  to  the  real  case,  i.e.,  he  G  IRn  and  z  G  IRn.  The  main 
difference  is  that  £  ~  N(0,Z*),  so  that  |£,|2  in  the  proof  of  Lemma  5.1  and  r]j  in  the  proof  of 
Theorem  5.2  both  follow  a  y2  distribution  with  one  degree  of  freedom.  Then 


poo 

Prob  (|&|2  >  7)  =  / 

Jsn 


00  -t2/ 2 


-dt  < 


roo  — t*/2 


'  n/T 


dt  =  \  — e  7/2,  V  7  >  0, 


E(r]j)  =  1,  and 


roo  g— 7/2 

^|7?j-^(77j)|  =  y  — ^==|t  —  l|dt 


r-i  g— 7/2 


—  [  \Jte  t!2dt 

7T  Jo 

1  r  r  -7/2  ,  1  e"t/2  , 

+  , /  vte  ' 2dt - 7=  /  — p- dt 

V2tt  Ji  V2tt  Ji  Vt 


\/27r  Jo 


<  0.968, 
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where  in  the  last  step  we  used  integration  by  parts  on  the  first  and  the  fourth  terms.  This  yields 
the  analogous  bound  that,  for  any  7  >  1  and  /j  £  (0, 1), 


Prob 


max  £  Hit  <  7,  ||£||  >  fiTr(Z*)  >  1  -  K 

Ki<m 


Ae-7/2 


0.968 


1  -  /t 


>  1  -  Ke~'l/2 


0.968 
1  -  l-t ' 


where  K  :=  min{rank(iJj),  \j2m}.  Setting  fi  =  0.01  and  7  =  21n(50AT)  yields  a  positive 
right-hand  side  of  0.0022...  This  in  turn  shows  that  nstjp  =  uqp  if  m  <  2  (see  the  proof  of  Theorem 
3.3)  and  otherwise 


1 

^qp  -  2001n(50A')  ysdp' 


We  note  that,  in  the  real  case,  a  sharper  bound  of 

Uqp  -  21n(2m/r) Vsdp’ 

where  fi  :=  min{m,  max;  rank(ffj)},  was  shown  by  Nemirovski,  Roos  and  Terlaky  [14]  (also  see 
[13,  Theorem  4.7]),  though  the  above  proof  seems  simpler.  Also,  an  example  in  [14]  shows  that  the 
0(l/ln?n)  bound  is  tight  (up  to  a  constant  factor)  in  the  worst  case.  This  example  readily  extends 
to  the  complex  case  by  identifying  <Dn  with  IR2n  and  observing  that  \hfz\  >  |Re(/i^)TRe(^)  + 
\m(he)T lm(z)  for  any  ht,z  £  <Cn.  Thus,  in  the  complex  case,  the  0(l/lnm)  bound  is  also  tight 
(up  to  a  constant  factor). 

6.  Discussion.  In  this  paper,  we  have  analyzed  the  worst-case  performance  of  SDP  relaxation 
and  convex  restriction  for  a  class  of  NP-harcl  quadratic  optimization  problems  with  homogeneous 
quadratic  constraints.  Our  analysis  is  motivated  by  important  emerging  applications  in  transmit 
beamforming  for  physical  layer  multicasting  and  sensor  localization  in  wireless  sensor  networks. 
Our  generalization  (1.1)  of  the  basic  problem  in  [20]  is  useful,  for  it  shows  that  the  same  convex 
approximation  approaches  and  bounds  hold  in  the  case  where  each  multicast  receiver  is  equipped 
with  multiple  antennas.  This  scenario  is  becoming  more  pertinent  with  the  emergence  of  small  and 
cheap  multi-antenna  mobile  terminals.  Furthermore,  our  consideration  of  the  related  homogeneous 
QP  maximization  problem  has  direct  application  to  the  design  of  jam-resilient  intercept  beamform- 
ers.  In  addition  to  these  timely  topics,  more  traditional  signal  processing  design  problems  can  be 
cast  in  the  same  mathematical  framework;  see  [20]  for  further  discussions. 

While  theoretical  worst-case  analysis  is  very  useful,  empirical  analysis  of  the  ratio  through 
simulations  with  randomly  generated  steering  vectors  {he}  is  often  equally  important.  In  the 
context  of  transmit  beamforming  for  multicasting  [20]  for  the  case  |2)|  =  1  V  *  (single  receiving 
antenna  per  subscriber  node),  simulations  have  provided  the  following  insights: 

•  For  moderate  values  of  m,  n  (e.g.,  m  =  24,  n  =  8),  and  independent  and  identically 
distributed  (i.i.d.)  complex-valued  circular  Gaussian  (i.i.d.  Rayleigh)  entries  of  the  steering 
vectors  {he},  the  average  value  of  u<"‘  is  under  3  -  much  lower  than  the  worst-case  value 

sdp 

predicted  by  our  analysis. 
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H=randn(4,8) 


U 

Fig.  6.1.  Upper  bound  on  qp  for  m  =  8,  n  =  4,  300  realizations  of  real  Gaussian  i.i.d.  steering  vector 

Vsdp 

entries,  solution  constrained  to  be  real. 


•  In  all  generated  instances  where  all  steering  vectors  have  positive  real  and  imaginary  parts, 
the  ratio  equals  one  (with  error  below  10-8).  This  is  better  than  what  our  worst-case 

sdp 

analysis  predicts  for  limited  phase  spread  (see  Theorem  3.7). 

•  In  experiments  with  measured  VDSL  channel  data,  for  which  the  steering  vectors  follow 
a  correlated  log-normal  distribution,  -^3E-  =  1  in  over  50%  of  instances. 

Vsdp 

•  Our  analysis  shows  that  the  worst-case  performance  ratio  is  smaller  in  the  complex 
case  than  in  the  real  case  ( 0(m )  versus  0(m2)).  Moreover,  this  remains  true  with  high 
probability  when  uqp  is  replaced  by  its  upper  bound 

:=fcminjv||^(^)||2, 

where  ^1,...,^JV  are  generated  by  N  independent  trials  of  the  randomization  procedure 
(see  Subsections  3.1  and  3.2)  and  N  is  taken  sufficiently  large.  In  our  simulation,  we  used 
N  =  30nm.  Figure  6.1  shows  our  simulation  results  for  the  real  Gaussian  case.6  It  plots 

V  , 

qp  for  300  independent  realizations  of  i.i.d.  real-valued  Gaussian  steering  vector  entries, 

sdp 

for  to  =  8,  n  =  4.  Figure  6.2  plots  the  corresponding  histogram.  Figures  6.3  and  6.4 
show  the  corresponding  results  for  i.i.d.  complex-valued  circular  Gaussian  steering  vector 
entries.7  Both  the  mean  and  the  maximum  of  the  upper  bound  ubqp  are  lower  in  the 

sdp 

complex  case.  The  simulations  indicate  that  SDP  approximation  is  better  in  the  complex 
case  not  only  in  the  worst  case  but  also  on  average. 

6 Here  the  SDP  solution  is  constrained  to  be  real- valued,  and  real  Gaussian  randomization  is  used. 

'  Here  the  SDP  solutions  are  complex- valued,  and  complex  Gaussian  randomization  is  used. 
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H=randn(4,8) 


ubqp/sdp 


Fig.  6.2.  Histogram  of  the  outcomes  in  Fig.  1. 


The  above  empirical  (worst-case  and  average-case)  analysis  complements  our  theoretical  worst- 
case  analysis  of  the  performance  of  SDP  relaxation  for  the  class  of  problems  considered  herein. 

Finally,  we  remark  that  our  worst-case  analysis  of  SDP  performance  is  based  on  the  assumption 
that  the  homogeneous  quadratic  constraints  are  concave  (see  (1.1)).  Can  we  extend  this  analysis 
to  general  homogeneous  quadratic  constraints?  The  following  example  in  1R2  suggests  that  this  is 
not  possible. 

Example  3:  For  any  L  >  0,  consider  the  quadratic  optimization  problem  with  homogeneous 
quadratic  constraints: 

min  ||z||2 

(6.1)  s.t.  z\  >1,  zf  —  Lz\Z2  >1,  zf  +  Lz\Z2  >  1, 

2  e  H2. 

The  last  two  constraints  imply  zf  >  L|zi||z2|  +  1  which,  together  with  the  first  constraint  z\  >  1, 
yield  zf  >  L\z\\  +  1  or,  equivalently,  \zi\  >  (L  +  \JL2  +  4)/2.  So  the  optimal  value  of  (6.1)  is  at 
least  1  +  (L  +  \J L2  +  4)2/4  (and  in  fact  is  equal  to  this).  The  natural  SDP  relaxation  of  (6.1)  is 

min  Zu  +  Z22 

S.t.  ^22  ^  1)  ^11  —  LZ\2  >  1,  +  LZ\2  ^  1, 

Zto. 

Clearly,  Z  =  I2  is  a  feasible  solution  (and,  in  fact,  an  optimal  solution)  of  this  SDP,  with  an 
objective  value  of  2.  Therefore,  the  SDP  performance  ratio  for  this  example  is  at  least  1/2  +  (L  + 
\/ L2  +  4)2/8,  which  can  be  arbitrarily  large. 
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H=randn(4,8)+j*randn(4,8) 


Fig.  6.3. 
entries. 


Upper  bound  on  v  qp  for  m  =  8,  n  =  4,  300  realizations  of  complex  Gaussian  i.i.d.  steering  vector 
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Abstract — This  paper  considers  the  problem  of  downlink 
transmit  beamforming  for  wireless  transmission  and  downstream 
precoding  for  digital  subscriber  wireline  transmission,  in  the 
context  of  common  information  broadcasting  or  multicasting 
applications  wherein  channel  state  information  (CSI)  is  available 
at  the  transmitter.  Unlike  the  usual  “blind”  isotropic  broadcasting 
scenario,  the  availability  of  CSI  allows  transmit  optimization. 
A  minimum  transmission  power  criterion  is  adopted,  subject  to 
prescribed  minimum  received  signal-to-noise  ratios  (SNRs)  at 
each  of  the  intended  receivers.  A  related  max-min  SNR  “fair” 
problem  formulation  is  also  considered  subject  to  a  transmitted 
power  constraint.  It  is  proven  that  both  problems  are  NP-hard; 
however,  suitable  reformulation  allows  the  successful  applica¬ 
tion  of  semidefinite  relaxation  (SDR)  techniques.  SDR  yields  an 
approximate  solution  plus  a  bound  on  the  optimum  value  of  the 
associated  cost/reward.  SDR  is  motivated  from  a  Lagrangian 
duality  perspective,  and  its  performance  is  assessed  via  pertinent 
simulations  for  the  case  of  Rayleigh  fading  wireless  channels.  We 
find  that  SDR  typically  yields  solutions  that  are  within  3-4  dB  of 
the  optimum,  which  is  often  good  enough  in  practice.  In  several 
scenarios,  SDR  generates  exact  solutions  that  meet  the  associated 
bound  on  the  optimum  value.  This  is  illustrated  using  measured 
very-high-bit-rate  Digital  Subscriber  line  (VDSL)  channel  data, 
and  far-field  beamforming  for  a  uniform  linear  transmit  antenna 
array. 

Index  Terms — Broadcasting,  convex  optimization,  downlink 
beamforming,  minimization  of  total  radiation  power,  multicas¬ 
ting,  semidefinite  programming,  semidefinite  relaxation  (SDR), 
very-high-bit-rate  Digital  Subscriber  line  (VDSL)  precoding. 


I.  Introduction 

CONSIDER  a  transmitter  that  utilizes  an  antenna  array  to 
broadcast  information  to  multiple  radio  receivers  within  a 
certain  service  area.  The  traditional  approach  to  broadcasting  is 
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to  radiate  transmission  power  isotropically,  or  with  a  fixed  direc¬ 
tional  pattern.  However,  future  digital  video/audio/data  broad¬ 
casting  and  multicasting  applications  are  likely  to  be  based  on 
subscription  to  services;  hence,  it  is  plausible  to  assume  that  the 
transmitter  can  acquire  channel  state  information  (CSI)  for  all 
its  intended  receivers. 

The  goal  of  this  paper  is  to  develop  efficient  algorithms  for 
the  design  of  broadcasting  schemes  that  exploit  this  channel 
information  in  order  to  provide  better  performance  than  the 
traditional  approaches. 

Our  design  approach  is  based  on  providing  Quality  of  Ser¬ 
vice  (QoS)  assurance  to  each  of  the  receivers.  Since  the  received 
signal-to-noise  ratio  (SNR)  determines  the  maximum  achiev¬ 
able  data  rate  and  (essentially)  determines  the  probability  of 
error,  it  is  an  effective  measure  of  the  QoS.  We  consider  two 
basic  design  problems.  The  first  seeks  to  minimize  the  total 
transmission  power  (and  thus  leakage  to  neighboring  cochannel 
transmissions),  subject  to  meeting  (potentially  different)  con¬ 
straints  on  the  received  SNR  for  each  individual  intended  re¬ 
ceiver.  The  second  is  a  “fair”  design  problem  in  which  we  at¬ 
tempt  to  maximize  the  smallest  receiver  SNR  over  the  intended 
receivers,  subject  to  a  bound  on  the  transmitted  power.  We  will 
show  that  both  these  problems  are  NP-hard,  but  we  will  also 
show  that  designs  that  are  close  to  being  optimal  can  be  effi¬ 
ciently  obtained  by  employing  semidefinite  relaxation  (SDR) 
techniques. 

Our  designs  are  initially  developed  for  a  wireless  broadcast 
scenario  in  which  each  user  employs  a  single  receive  antenna 
and  the  channel  is  modeled  as  being  flat  in  frequency  and  quasi¬ 
static  in  time.  However,  the  designs  are  also  appropriate  on  a 
per-tone  basis  for  orthogonal-frequency-division  multiplexing 
(OFDM)  and  related  multicarrier  systems,  and,  as  we  will  show, 
they  can  be  generalized  in  a  straightforward  manner  to  single¬ 
carrier  systems  transmitting  over  frequency-selective  channels. 
In  addition  to  wireless  systems,  applications  of  the  proposed 
methodology  also  appear  in  downstream  multicast  transmission 
for  multicarrier  and  single-carrier  digital  subscriber  line  (DSL) 
systems.  In  this  context,  (linear)  precoding  of  multiple  DSL 
loops  in  the  same  binder  that  wish  to  subscribe  to  a  common  ser¬ 
vice  (e.g.,  news  feed,  video-conference,  or  movie  multicast)  can 
be  employed  to  improve  QoS  and/or  reduce  far-end  crosstalk 
(FEXT)  interference  to  other  loops  in  the  binder.  In  scenarios  in 
which  the  customer-premise  equipment  (CPE)  receivers  are  not 
physically  co-located  (as  in  residential  service)  or  cannot  be  co¬ 
ordinated  (legacy  CPE),  multiuser  decoding  of  the  downstream 
transmission  is  not  feasible,  while  transmit  precoding  is  viable. 
The  most  important  difference  between  DSL  and  the  wireless 
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multicast  scenario  is  that  DSL  channels  are  diagonally  domi¬ 
nated.  Still,  exploiting  the  crosstalk  coupling  to  reduce  FEXT 
levels  to  other  loops  in  the  binder  can  provide  significant  per¬ 
formance  gains,  especially  if  (cooperative  or  competitive)  power 
control  is  implemented. 

It  is  interesting  to  note  that,  as  of  today,  Internet  multicas¬ 
ting  (using  IP’s  Multicast  Backbone — MBone)  is  performed  at 
the  network  layer,  e.g.,  via  packet-level  flooding  or  spanning- 
tree  access  of  the  participant  nodes  and  any  intermediate  nodes 
needed  to  access  the  participants.  To  complement  that  approach, 
what  we  advocate  herein  can  be  interpreted  as  judicious  phys¬ 
ical  layer  multicasting,  that  is,  enabled  by  i)  the  availability  of 
multiple  transmitting  elements;  ii)  exploiting  opportunities  for 
joint  beamforming/precoding;  and  iii)  the  availability  of  CSI  at 
the  transmitting  node  or  one  of  its  proxies.  This  is  a  cross-layer 
optimization  approach  that  exploits  information  available  at  the 
physical  layer  to  reduce  relay  retransmissions  at  the  network 
layer,  thus  providing  congestion  relief  and  QoS  guarantees. 

Notation:  We  use  lowercase  boldface  letters  to  denote 
column  vectors  and  uppercase  bold  letters  to  denote  matrices. 
(•)T  denotes  transpose,  while  (■)H  denotes  Hermitian  (conju¬ 
gate)  transpose.  Re{-}  extracts  the  real  part  of  its  argument, 
and  Im{-}  the  imaginary  part. 

II.  Data  Model  and  Problem  Statement 

Consider  a  wireless  scenario  incorporating  a  single  trans¬ 
mitter  with  N  antenna  elements  and  M  receivers  each  with 
a  single  antenna.  Let  hj  denote  the  N  X  I  complex  vector 
that  models  the  propagation  loss  and  phase  shift  of  the  fre¬ 
quency-fiat  quasi-static  channel  from  each  transmit  antenna 
to  the  receive  antenna  of  user  i  £  {1 ,M},  and  let  wH 
denote  the  beamforming  weight  vector  applied  to  the  N 
transmitting  antenna  elements.  If  the  signal  to  be  transmitted 
is  zero-mean  and  white  with  unit  variance,  and  if  the  noise1 
at  receiver  i  is  zero-mean  and  white  with  variance  <7?  then 

2  ^ 

the  receiver  SNR  for  the  ith  user  is  w^h,  |  /erf.  Let  pmin,i 
be  the  prescribed  minimum  SNR  for  the  ith  user  and  define 
the  normalized  channel  vectors  li,  :=  h,;/  \J punu.i/t'f-  Then 
|wHh.j|2/crf  >  pmin,i  Iw^h,,!2  >  1.  Therefore,  the  design 
of  the  beamformer  that  minimizes  the  transmitted  power,  sub¬ 
ject  to  (possibly  different)  constraints  on  the  received  SNR  of 
each  user,  can  be  written  as 


We  will  denote  an  instance  of  problem  Q  as  Q({h;};=1), 
keeping  in  mind  that  h;  =  hi/^/pmintiaf. 

Remark  1:  One  could  think  of  imposing  the  stricter  con¬ 
straints  =  1,  Vi  in  order  to  avoid  the  need  for  single-tap 

equalization  at  the  receivers.  However,  we  are  interested  in  the 
practically  important  case  of  M  >  N,  wherein  the  stricter  con¬ 
straints  generically  yield  an  overdetermined  system  of  equa¬ 
tions,  and  thus  an  infeasible  problem.  On  the  other  hand,  it  is 

'The  noise  may  include  unmodeled  interference. 


easy  to  see  that  problem  Q  is  always  feasible,  provided  of  course 
that  none  of  the  channel  vectors  is  identically  zero. 

Problem  Q  is  formulated  under  the  assumption  that  the  design 
center  (usually  the  transmitter)  has  knowledge  of  the  channel 
vector  hi  (and  the  noise  variance  erf)  for  each  user.  This  can 
be  accomplished  in  a  straightforward  manner  in  fixed  wireless 
systems  and  time-division-duplex  (TDD)  systems.  In  other  sys¬ 
tems,  it  can  be  accomplished  through  the  use  of  beacon  signals, 
periodically  transmitted  from  the  broadcasting  station  (and  typ¬ 
ically  embedded  in  the  transmission).  The  receiving  radios  can 
then  feed  back  their  CSI  through  a  feedback  channel.  For  the 
purposes  of  this  paper,  we  will  assume  that  the  design  center 
has  perfect  knowledge  of  the  channel  vectors,  but  extensions  to 
cases  of  imperfect  knowledge  are  under  development. 

Problem  Q  is  a  quadratically  constrained  quadratic  program¬ 
ming  (QCQP)  problem,  but  unfortunately  the  constraints  are  not 
convex.2  Nonconvexity,  per  se,  does  not  mean  that  the  problem  is 
difficult  to  solve;  however,  we  have  the  following  claim,  whose 
proof  can  be  found  in  Appendix  I. 

Claim  1:  The  QoS  problem  Q  is  NP-hard. 

The  implication  of  Claim  1  is  that  if  an  algorithm  could  solve 
an  arbitrary  instance  of  problem  Q  in  polynomial  time,  it  would 
then  be  possible  to  solve  a  whole  class  of  computationally  very 
difficult  problems  in  polynomial  time  [4].  The  current  scientific 
consensus  indicates  that  this  is  unlikely. 

A.  Review  of  Pertinent  Prior  Art 

The  above  problem  is  reminiscent  of  some  closely  related 
problems.  For  M  =  1,  the  optimum  w  is  a  matched  filter.  When 
the  scaled  channel  vectors  h,  span  a  ball  or  ellipsoid  about  a 
“nominal”  channel  vector,3  the  problem  can  be  transformed  ex¬ 
actly  into  a  second-order  cone  program,  and  hence  can  be  ef¬ 
ficiently  solved  [13].  Unfortunately,  this  transformation  cannot 
be  employed  in  the  case  of  finitely  many  channel  vectors  (in¬ 
tended  receivers). 

Another  closely  related  work  is  that  in  [1]  (and  references 
therein),  which  considers  the  problem  of  multiuser  transmit 
beamforming  for  the  cellular  downlink.  The  key  difference  be¬ 
tween  [1]  and  our  formulation  is  that  the  authors  of  [1]  consider 
the  transmission  of  independent  information  to  each  of  the 
downlink  users,  whereas  we  focus  on  (common  information) 
multicast.  The  mathematical  problems  are  not  equivalent.  A 
fundamental  difference  is  that  our  problem  is  NP-hard,  whereas 
the  formulation  in  [1]  can  be  efficiently  solved.  To  further 
appreciate  the  difference  intuitively,  we  point  out  that  in  the 
generic  case  of  our  formulation  most  of  the  SNR  constraints 
will  be  inactive  at  the  optimum  (i.e.,  most  of  the  constraints  will 
be  oversatisfied).  Consider,  for  example,  the  case  of  two  closely 
located  receivers  with  different  SNR  requirements:  one  of  the 
two  associated  constraints  will  be  oversatisfied  at  the  optimum. 
On  the  other  hand,  it  is  proven  in  [1]  that  in  the  formulation  of 
[1]  the  constraints  are  always  met  with  equality  at  the  optimum. 
The  important  common  denominator  of  our  work  and  [  1]  is  the 
use  of  semidefinite  programming  tools. 

2This  is  easy  to  see  for  N  =  1 ,  in  which  case  each  constraint  requires  that 
the  magnitude  of  w  be  greater  than  a  constant. 

3This  implies  a  continuum  of  intended  receivers. 
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Transmit  beamforming  for  the  dissemination  of  common  in¬ 
formation  to  multiple  users  has  been  considered  in  the  Ph.D. 
dissertation  of  Lopez  [7,  ch.  5].  Lopez  proposed  maximizing 
the  sum  of  received  SNRs,  which  is  equivalent  to  maximizing 
the  average  SNR  over  all  users.  This  formulation  leads  to  a  prin¬ 
cipal  component  computational  problem  for  the  optimum  beam- 
former,  which  is  relatively  simple  to  solve.  The  drawback  is  that 
quality  of  service  cannot  be  guaranteed  to  all  users  in  this  way. 
This  is  important,  because  the  weakest  user  link  determines  the 
common  information  rate.  Still,  the  work  of  Lopez  is  the  closest 
in  spirit  to  ours,  and  for  this  reason  we  will  include  the  max¬ 
imum  average  SNR  approach  in  our  performance  evaluations  in 
Section  VIII  (see  Table  V). 


III.  Relaxation 

Toward  solving  our  problem,  we  first  recast  it  as  follows: 
min  trace(wwH) 

W 

subject  to:  trace(wwffQ  ;)>!•  ie  {!>*.■-, M} 

where  we  have  used  the  fact  that  hfww^h;  = 
trace(h^wwHhi)  =  trace(wwHhjh^),  and  we  have  de¬ 
fined  Q;  :=  hihf.  Now  consider  the  following  reformulation 
of  the  problem: 

min  trace!  X) 

xccJVxiV 

subject  to:  trace(XQ;)  >1, 

X  ^0 
rank(X)  =  1 

where  now  X  is  an  N  x  N  complex  matrix,  and  the  inequality 
X  y  0  means  that  the  matrix  X  is  symmetric  positive  semidef- 
inite.  Note  that,  in  the  above  equivalent  formulation  of  our 
problem,  the  cost  function  is  linear  in  X;  the  trace  constraints 
are  linear  inequalities  in  X,  and  the  set  of  symmetric  positive 
semidehnite  matrices  is  convex;  however,  the  rank  constraint 
on  X  is  not  convex.4  The  important  observation  is  that  the 
above  problem  is  in  a  form  suitable  for  semidehnite  relaxation 
(SDR)  (see,  e.g.,  [9]  and  references  therein);  that  is,  dropping 
the  rank-one  constraint,  one  obtains  the  relaxed  problem 

min  trace(X) 

xecN><K 

subject  to:  trace(XQi)  >  1,  and  X^O 

which  is  a  semidehnite  programming  problem  (SDP),  albeit  not 
yet  in  standard  form.  In  order  to  put  it  in  standard  form,  we  add 
M  “slack”  variables  S;  G  R,  i  6  {1,  •  •  • ,  M},  one  for  each  trace 
constraint.  In  this  way,  we  obtain  the  program 


Qr  ■ 


min  vec(Iv)Tvec(X) 

xecNxiv,Si(ER 

s.t.:  vec(Qf)T  vec(X)  — Si  =  l,  ie{l,  •  .  M } 

Si>0,  i  £  {1,  •  •  •  ,M},  and  X^O 


which  is  now  expressed  in  a  standard  form  used  by  SDP  solvers, 
such  as  SeDuMi  [11].  Here,  I\  is  the  identity  matrix  of  size 

N  x  N. 

SDP  problems  can  be  efficiently  solved  using  interior  point 
methods,  at  a  complexity  cost  that  is  at  most  0((M  +  iV2)3'5) 
and  is  usually  much  less.  SeDuMi  [11]  is  a  MATLAB  imple¬ 
mentation  of  modern  interior  point  methods  for  SDP  that  is  par¬ 
ticularly  efficient  for  up  to  moderate-sized  problems,  as  is  the 
case  in  our  context.  Typical  run  times  for  realistic  choices  of  N 
and  M  are  under  1/10  s,  on  a  typical  personal  computer. 

IV.  Algorithm 

Due  to  the  relaxation,  the  matrix  Xopt  obtained  by  solving 
the  SDP  in  Problem  Qr  will  not  be  rank  one  in  general.  If  it 
is,  then  its  principal  component  will  be  the  optimal  solution 
to  the  original  problem.  If  not,  then  trace(Xopt)  is  a  lower 
bound  on  the  power  needed  to  satisfy  the  constraints.  This 
comes  from  the  fact  that  we  have  removed  one  of  the  orig¬ 
inal  problem’s  constraints.  Researchers  in  optimization  have 
recently  developed  ways  of  generating  good  solutions  to  the 
original  problem,  Q,  from  Xopi .  [9],  [12],  [15].  This  process 
is  based  on  randomization :  using  Xopt  to  generate  a  set  of 
candidate  weight  vectors,  {wf},  from  which  the  “best”  solution 
will  be  selected.  We  consider  three  methods  for  generating  the 
w/s,  which  have  been  designed  so  that  their  computational 
cost  is  negligible  compared  to  that  of  computing  Xopt-  (For 
consistency,  the  principal  component  is  also  included  in  the  set 
of  candidates.)  In  the  first  method  (randA),  we  calculate  the 
eigen-decomposition  of  Xopt  =  UEUff  and  choose  W/  such 
that  Wf  =  where  the  elements  of  eg  are  independent 

random  variables,  uniformly  distributed  on  the  unit  circle  in 
the  complex  plane;  i.e.,  [e^]-  =  ,  where  the  9g^  are  inde¬ 

pendent  and  uniformly  distributed  on  [0, 2t r).  This  ensures  that 
wfwg  =  trace(Xopt).  irrespective  of  the  particular  realization 
of  eg.  In  the  second  method  (randB),  inspired  by  Tseng  [12], 
we  choose  wg  such  that  \wg\-  =  y/[Xopt] [e^,  which  ensures 
that  |[wf]j;|2  =  [Xopt]^.  The  third  method  (randC),  motivated 
by  successful  applications  in  related  QCQP  problems  [8],  uses 
Wf  =  UEl/2V/:,  where  Vg  is  a  vector  of  zero-mean,  unit-vari¬ 
ance  complex  circularly  symmetric  uncorrelated  Gaussian 
random  variables.  This  ensures  that  E\wgw^\  =  Xopt  [8]. 

For  both  randA  and  randB,  ||wf||2  =  trace(Xopt),  and 
hence  when  raiik(Xopt)  >  1»  at  least  one  of  the  constraints 
|wfhjj2  >  eg  will  be  violated.5  However,  a  feasible  weight 
vector  can  be  found  by  simply  scaling  w g  so  that  all  the  con¬ 
straints  are  satisfied.  Under  randC,  ||wf||2  depends  on  the 
particular  realization  of  \g,  but  again  the  resulting  w g  can  be 
scaled  to  the  minimum  length  necessary  to  satisfy  the  con¬ 
straints.  The  “best”  of  these  randomly  generated  weight  vectors 
is  the  one  that  requires  the  smallest  scaling.  For  convenience, 
we  have  summarized  the  algorithm  in  Table  I,  which  includes  a 
simple  MATLAB  interface  to  SeDuMi  [11]  for  the  solution  of 
the  semidehnite  relaxation,  Qr.  We  point  out  that  we  have  not 
yet  been  able  to  obtain  theoretical  a  priori  bounds  on  the  extent 


4The  sum  of  two  rank-one  matrices  has  generic  rank  two. 


5Recall  that  because  of  the  relaxation,  tracc(Xopt)  is  a  lower  bound  on  the 
energy  of  the  optimal  weight  vector  for  the  original  problem. 
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TABLE  I 

Broadcast  QoS  Beamforming  via  SDR:  Algorithm 

•  Solve  the  relaxed  problem: 

A  simple  MATLAB  interface  for  SeDuMi  is  as  follows: 

%  Input  Data: 

%  H:  N  by  M,  columns  are  scaled  channels  h,-/ of 
%  Output  Data: 

%  Xopt :  the  solution  to  the  SDR 
vecQs  =  []  ; 
for  i=l :M, 

Qi  =  H ( : , i ) *H ( : , i )  '  ; 
vecQs  =  [vecQs,  vec (Qi  .  '  )  ]  ; 
end; 

A= [-eye  (M)  ,  vecQs  .  '  ]  ; 
b=ones (M, 1) ; 

c=  [zeros  (M,  1)  ;  vec  (eye  (N)  )  ]  ; 

K . 1 =M ;  K.S=N;  K . scomplex= 1 ; 

[x_opt ,  y_opt ,  inf  o]  =sedumi  (A,  b,  c,  K)  ; 

Xopt=mat  (x_opt  (M+l :  end)  )  ; 

•  Randomization: 

Use  randA.  and/or  randB,  randC  to  generate  the  candidates,  w_ell. 

For  each  w_ell,  find  the  most  violated  constraint. 

Scale  w_ell  so  that  that  constraint  is  satisfied  with  equality. 

Pick  the  w_ell  with  the  smallest  norm. 

of  the  suboptimality  of  solutions  generated  in  this  way,  but  our 
simulation  results  are  quite  encouraging. 

V.  Max-Min  Fair  Beamforming 

We  now  consider  the  related  problem  of  maximizing  the  min¬ 
imum  received  SNR  over  all  receivers,  subject  to  a  bound  on  the 
transmitted  power.  That  is 


It  is  easy  to  see  that  the  constraint  in  problem  T  should  be  met 
with  equality  at  an  optimum,  for  otherwise  w  could  be  scaled  up, 
thereby  improving  the  objective  and  contradicting  optimality. 
Thus,  we  can  focus  on  the  equality-constrained  problem.  With 
a  scaling  of  the  optimization  variable  w  =  \fPw ,  the  equality- 
constrained  problem  can  be  equivalently  written  as 

P  max  min 

w  i 

subject  to:  ||w||2  =  1. 

It  is  clear  that  P  is  immaterial  with  respect  to  optimization; 
the  solution  scales  up  with  \[P,  while  the  optimum  value 
scales  up  with  P.  We  will  denote  an  instance  of  problem  T 
as  Let  wq  be  a  solution  to  Q({h»}^i)> 


and  Pq  the  associated  minimum  transmitted  power.  Consider 
^({hi£15Pg),thatis 

max  min  \  |wHh;|2  > 
w  i  l  J  i— 1 

subject  to:  1 1 w| ||  =  Pq 

and  let  w /  denote  an  optimal  solution.  Since  wq  already  attains 
>  1,  \/i,  it  follows  that  Iw^hjj  >  1,  Vi.  Hence,  w / 
also  satisfies  the  constraints  of  the  QoS  formulation,  and  at  the 
same  power  as  wq.  It  follows  that  wy  is  equivalent  to  wq.  This 
shows  Claim  2. 

Claim  2:  T  ^{h, }i=| .  P^j  is  equivalent  to  Q  ^{h,;}i=l  j  up 
to  scaling.  In  the  special  case  that  =  pm in,  Vi,  we  have 

that  hi  =  (\i.jj it,;)/ y7 prnirl,  Vi,  and  hence  T  ({h i/cr,}^,  P^j 

is  equivalent  to  Q  ^{hi}i=1^  up  to  scaling. 

Corollary  1:  One  way  to  solve  the  max-min  fair 
problem  T  P^  is  to  solve  the  QoS  problem 

Q({h  i/dijiL  , lben  scale  the  resulting  solution  to  the  desired 
power  P.  Conversely,  scaling  the  solution  of  T  1^ 

yields  a  solution  to  Q  even  in  the  case  of  unequal 

Pmin,i  • 

Remark  2:  It  is  important  not  to  lose  sight  of  the  fact 
that  T  ^{h  is  not  equivalent  up  to  scaling  to 

Q  when  the  pmin/s  are  unequal.  This  can  be  intu¬ 

itively  appreciated  by  noting  that  the  max-min  fair  formulation 
aims  to  maximize  the  minimum  received  SNR,  without  regard 
to  the  individual  SNR  constraints.  The  QoS  formulation,  on  the 
other  hand,  explicitly  guarantees  the  prescribed  minimum  SNR 
level  at  each  node. 

From  the  above,  and  Claim  1,  Claim  3  follows. 

Claim  3:  The  max-min  fair  problem  T  is  NP-hard. 

If  the  QoS  problem  could  be  solved  exactly,  there  would  have 
been  no  need  for  a  separate  algorithm  for  the  max-min  fair 
problem.  However,  we  can  only  solve  the  QoS  problem  ap¬ 
proximately  (cf.,  randomization  postprocessing  of  the  gener¬ 
ally  higher  rank  solution).  Due  to  this,  it  is  of  interest  to  de¬ 
velop  a  customized  SDR  algorithm  directly  for  the  max-min  fair 
problem.  Using  the  fact  that  hf^ww^h;  =  l  ra(:('(ww//hzh(//), 
and  defining  Q,  :=  hih^/cr?,  we  recast  the  max-min  fair 
problem  as  follows: 

max  min  trace(XQH 
XgCivxn!=1,-,M 

subject  to:  trace(X)  =  P,  X  V  0 
rank(X)  =  1. 

Dropping  the  rank  constraint,  we  obtain  the  relaxation 

max  min  trace(XQi) 

X6Cl,x»i=l,...,A/ 

subject  to:  trace(X)  =  P,  X  V  0. 

Introducing  an  additional  variable,  t,  this  relaxation  can  be 
equivalently  written  as 

max  t 
XGCWxjV,tGR 

subject  to:  trace(XQj)  >  t,  \H  £  {1,  •  •  ■ ,  M } 
trace(X)  =  P,  X  V  0. 
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TABLE  II 

Broadcast  Max-Min  Beamforming  via  SDR:  Algorithm 

•  Solve  the  relaxed  problem: 

A  suitable  MATLAB  interface  for  SeDuMi  is  as  follows: 

%  Input  Data: 

%  H:  N  by  M,  columns  are  scaled  channels  hj/ay 
%  P:  scalar,  the  total  transmit  power  constraint 
%  Output  Data: 

%  Xopt :  the  solution  to  the  SDR 

%  t_opt:  the  minimum  objective  value  of  the  SDR 
vecQs  =  [ ]  ; 
for  i=l : M, 

Qi  =  H{:,i)*H 
vecQs  =  [vecQs  vec(Qi.')]; 
end; 

Al= [-ones (M,  1 ) ;  0]; 

A2=[-eye(M);  zeros  ( 1 ,  M )  ]  ; 

A3= [vecQs . '  ;  vec (eye (N) )  . ' ] ; 

A= [Al  A2  A3] ; 

b= [zeros (M,  1 ) ;  P ] ; 

c  =  [-1;  zeros (M+N*N, 1) ] ; 

K.1=M+1;  K.s=N;  K . scomplex-1 ; 

[x_opt,  y_opt,  info]  =sedumi  (A,  b,  c,  K)  ; 

Xopt=mat  (x  opt (M+2 : end)  ) ; 
t_opt  =  x_opt(l) 

•  Randomization: 

Use  randA,  and/or  randB,  randC  to  generate  the  candidates  w_ell. 
Scale  each  w_ell  to  norm  P. 

Pick  the  one  that  yields  the  largest  min  (abs  (w_ell'  *H)  ) . 

Further  introducing  M  nonnegative  real  slack  variables,  one 
for  each  inequality  constraint,  we  convert  the  problem  to  an 
equivalent  one  involving  only  equality,  nonnegativity,  and  pos- 
itive-semidefinite  constraints 


This  problem  is  formatted  for  direct  solution  via  SeDuMi 
[11],  Table  II  provides  a  suitable  MATLAB  interface  for 
solving  this  relaxation.  Postprocessing  of  the  solution  of  the 
relaxed  problem  to  approximate  the  solution  of  the  original 
max-min-fair  problem  can  be  accomplished  using  randA, 
randB,  and  randC,  but  the  selection  criterion  is  different  (see 
Table  II). 


In  closing  this  section,  we  would  like  to  point  out  connections 
between  problems  T  and  .'/>  and  the  problem  of  maximizing 
the  common  mutual  information  of  the  (nondegraded)  Gaussian 
broadcast  channel  in  which  the  transmitter  has  N  antennas  and 
each  of  the  M  (noncooperative)  receivers  has  a  single  antenna. 
If  X  denotes  the  covariance  of  the  transmitted  signal,  then  the 
maximum  achievable  common  information  rate  (in  the  sense  of 
Shannon)  can  be  written  as  (see,  e.g.,  [6]  and  references  therein) 

C:=  max  min  jlog  (l  +  }  • 

trace(X)<P  V  \  l  /  J  l—l 

Alternatively,  we  can  rewrite  this  max-min  problem  as 

max  t 

(  ir^Xh  \ 

subject  toX^O,  trace(X) <  P,  log ^1-i — - — , — -J  > t,  Mi. 

By  the  monotonicity  of  the  “log”  function,  the  above  problem 
is  further  equivalent  to 


h  H  vu . 

subject  to  X  y  0,  trace(X)  <  P,  — — ^ — -  >  t,  Mi 

in  the  sense  that  they  yield  the  same  optimal  transmit  covariance 
matrix  X.  The  latter  problem  is  identical  to  problem  Tr.  In 
other  words,  the  semidefinite  relaxation  of  problem  T  actually 
yields  a  transmit  covariance  matrix  that  achieves  the  maximum 
common  information  rate  C.  In  a  similar  manner,  we  can  argue 
that  the  rank-one  transmit  covariance  matrix  obtained  from 


problem  T  achieves  the  maximum  common  information  rate 
under  the  restriction  that  beamforming  is  employed.  Fiowever, 
the  latter  rate  can  be  significantly  lower  than  C  for  a  large 
number  of  users  [6],  Nonetheless,  from  a  practical  perspective, 
beamforming  is  attractive  because  it  is  simple  to  implement,6 
requiring  only  a  single  standard  additive  white  Gaussian  noise 
(AWGN)  channel  encoder  and  decoder.  In  contrast,  achieving 
the  maximum  common  information  rate  C  in  general  requires 
higher  rank  transmit  covariance  matrix  X.  In  that  case,  a 
weighted  sum  of  multiple  independent  signals  is  transmitted 
from  each  antenna,  with  each  independent  signal  requiring 
a  separate  AWGN  channel  encoder  and  decoder.  Hence,  the 
beamforming  strategy  considered  in  this  paper  trades  off  a 
potential  reduction  in  the  maximum  common  information  rate 
for  implementation  simplicity. 


VI.  Case  of  Frequency-Selective  Multipath 

Although  we  have  focused  our  attention  so  far  on  fre¬ 
quency-flat  fading  channels,  the  situation  is  quite  similar  in 
the  case  of  spatial  beamforming7  for  common  information 
transmission  over  frequency-selective  (intersymbol  interfer¬ 
ence)  channels.  Let  hp  denote  the  Ah  N  X  1  vector  tap  of 
the  baseband-equivalent  discrete-time  impulse  response  of  the 
multipath  channel  between  the  transmitter  antenna  array  and 

6 A  properly  weighted  common  temporal  signal  is  transmitted  from  each  an- 
tenna. 

7It  is  perhaps  worth  emphasizing  that,  while  space-time  precoding  would 
generally  be  preferable  from  a  performance  point  of  view  when  the  channels 
are  time  dispersive,  we  (continue  to)  consider  spatial  beamforming  only  in  this 
section.  This  is  motivated  from  a  complexity  point  of  view.  Space-time  multi¬ 
cast  precoding  is  an  interesting  topic  for  future  research. 
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the  (single)  receive  antenna  of  receiver  i.  Assume  that  delay 
spread  is  limited8  to  L  nonzero  vector  channel  taps.  Define  the 
channel  matrix  for  the  ith  receiver  as 

H  —  M0)  •••  h^_1)l 

Beamforming  the  transmit  array  with  a  fixed  (time-invariant) 
wH  yields  a  scalar  equivalent  channel  from  the  viewpoint  of 
the  /th  receiver,  whose  scalar  taps  are  given  by 

or,  in  vector  form 

hj  =  wffHi, 


Now,  if  a  Viterbi  equalizer  is  used  for  sequence  estimation  at  the 
receiver,  then  the  parameter  that  determines  performance  is  [3] 


wffH,Hfw 


trace(wwffQ;), 


where  now  Q;  :=  H,:Hf  /  of  and  is  generally  of  higher  rank 
than  before,  but  otherwise  things  remain  conceptually  the 
same.  In  particular,  the  relaxations  Qr  and  Tr  and  the  algo¬ 
rithms  in  Tables  I  and  II  can  be  employed  as  they  were  in  the 
frequency-flat  case — only  the  definition  of  the  input  matrices 
changes. 


VII.  Insights  Afforded  via  Duality 
Let  us  return  to  our  original  problem  Q,  as  follows: 
min  llwllo 

wg<C« 

subject  to:  |wffh;  r  >  1, 

We  will  now  gain  some  insight  into  the  quality  of  the  so¬ 
lution  generated  by  the  semidefinite  relaxation  of  Q  using 
bounds  obtained  from  duality.  For  convenience,  we  first  con¬ 
vert  the  problem  to  real-valued  form;  this  yields  a  2 N  x  1 
vector  of  real  variables  x  :=  [Re{w}TIm{w}T]  ,  and 
the  Q,’s  are  now  2N  X  2N  symmetric  matrices  of  rank  2: 
Qi  :=  gigf  +  where  g,;  :=  [Re{hi}TIm{hi}T]T 

/■v  rs/  1 

and  g,  :=  [Trn{h,,;}T  —  Rejh,  }T]  .  Problem  Q  can  then  be 
rewritten  as 


The  Lagrangian  of  problem  V  is  [2] 


M 

£(x,  A)  =  xrx  +  Y,  A,;(l  -  xrQ ;x) 
i= 1 

(M  \  M 

I  -  Y  j  x  +  Xj 

i  I  /  i= 1 

and  the  dual  problem  is 

maxmin£(x,  A) 

A^O  x 

8or,  essentially  limited;  the  remaining  taps  can  be  treated  as  interference. 


where  A  ^  0  denotes  A,  >  0.  If  the  symmetric  matrix  (I  — 

'■[_ !  A ;  Q  ( )  has  a  negative  eigenvalue,  then  it  is  easy  to  see  that 
the  quadratic  term  in  £(x.  A)  is  unbounded  from  below  (e.g., 
choose  x  proportional  to  the  corresponding  eigenvector).  If,  on 
the  other  hand,  all  eigenvalues  are  greater  than  or  equal  to  zero, 
then  the  said  matrix  is  positive  semidefinite  and  the  minimum 
over  x  is  attained,  e.g.,  at  x  =  0.  This  yields  the  following 
equivalent  of  the  dual  problem: 

M 

max  A i 

A;€R 

1=1 

M 

subject  to: 

2=1 

A;  >  0,  i  =  1, . . . ,  M 

which  is  a  semidefinite  program. 

The  dual  problem  is  interesting,  because  the  maximum  of  the 
dual  problem  is  a  lower  bound  on  the  minimum  of  the  original 
(primal)  problem  [2],  The  dual  problem  is  convex  by  virtue  of  its 
definition,  however  the  particular  dual  studied  above  is  special 
in  the  sense  that  optimization  over  x  for  a  given  A  can  be  carried 
out  analytically,  and  the  residual  A-optimization  problem  is  an 
SDP.  This  means  that  we  can  solve  the  dual  problem  and  thus 
obtain  the  tightest  bound  obtainable  via  duality.  This  duality-de¬ 
rived  bound  can  be  compared  to  the  SDR  bound  we  used  ear¬ 
lier.  Let  £>(•)  denote  the  dual  of  a  given  optimization  problem, 
and  let  'JZ(V)  denote  the  semidefinite  relaxation  of  V,  obtained 
by  dropping  the  associated  rank-one  constraint.  Furthermore,  let 
fj(-)  denote  the  optimal  value  of  a  given  optimization  problem. 

Theorem  1:  [14,  pp.  403-404]  V(V(V))  =  TZ(V)  and 

mcp))  = 

More  specifically.  Theorem  1  states  that  the  dual  of  the  dual 
of  V  is  the  SDR  of  V  and  that  the  optimal  objective  value  of  the 
SDR  of  V  is  the  same  as  the  optimal  objective  value  of  the  dual 
of  V .  Hence,  SDR  yields  the  same  lower  bound  on  the  optimal 
value  of  V  as  that  obtained  from  duality,  and  the  associated  gap 
between  this  bound  and  the  optimal  value  is  equal  to  the  duality 

gap- 

Theorem  1  along  with  Claim  2  directly  yield  the  following 
corollary  for  the  max-min-fair  problem  T . 

Corollary  2:  V(V(!F))  =  Tl(lF)  and  0(1Z(IF))  = 

PM?)). 

VIII.  Simulation  Results 

An  appropriate  figure  of  merit  for  the  performance  of  the  pro¬ 
posed  algorithm  for  the  QoS  beamforming  problem  Q  would 
be  the  ratio  of  the  minimum  transmitted  power  achieved  by  the 
proposed  algorithm  and  3(Q),  the  transmitted  power  achieved 
by  the  (true)  optimal  solution.  Unfortunately,  problem  Q  is 
NP-hard,  and  thus  /3(Q)  can  be  difficult  to  compute.  However, 
we  can  replace  /3(Q)  in  the  figure  of  merit  by  the  lower  bound 
obtained  from  the  SDR;  i.e.,  (3{Q)  >  (3(Qr)  =  trace(Xopt).  If 
we  let  { w/ }  denote  the  sequence  of  candidate  weight  vectors 
generated  via  randomization,  and  Wf  denote  the  minimally 
scaled  version  of  wy  that  satisfies  the  constraints  of  problem 
Q,  then  a  meaningful  and  easily  computable  figure  of  merit  is 
(rriiriy?  ||w£||2)/trace(X0pt)-  We  will  call  this  ratio  the  upper 
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TABLE  III 

MC  Simulation  Results  for  QoS  Beamforming:  Mean 
and  Standard  Deviation  of  Upper  Bound  on  Power  Boost. 
Each  Element  of  h,  Is  i.i.d.  With  a  Circularly  Symmetric 
Complex  Gaussian  (Rayleigh)  Distribution  of  Variance  1. 
All  Three  Randomization  Techniques  (randA,  randB, 
randc)  Are  USED  IN  PARALLEL,  FOR  1000 
Randomizations  Each.  pminj! of  =  l,Vi 


N/M 

mean 

std 

4/8 

1.12 

0.16 

4/16 

1.47 

0.30 

8/16 

1.82 

0.37 

8/32 

2.79 

0.47 

TABLE  IV 

MC  Simulation  Results  for  QoS  Beamforming:  Mean  and 
Standard  Deviation  of  Upper  Bound  on  Power  Boost.  Here, 
THE  Number  OF  Post-SDR  randomizations  =  30  JVM.  REMAINING 
Parameters  Are  as  in  Table  III 


N/M 

mean 

std 

4/8 

1.12 

0.16 

4/16 

1.44 

0.29 

8/16 

1.76 

0.34 

8/32 

2.49 

0.38 

bound  on  the  power  boost  required  to  satisfy  the  constraints.  If 
our  algorithm  achieves  a  power  boost  of  r],  then  the  transmitted 
power  is  guaranteed  to  be  within  a  factor  i]  of  that  of  the  optimal 
solution  3(Q)  and  will  often  be  closer. 

A.  Rayleigh  Fading  Wireless  Channels 

We  consider  the  standard  independent  and  identically  dis¬ 
tributed  (i.i.d.)  Rayleigh  fading  model  described  in  the  caption 
of  Table  III.  That  table  summarizes  the  results  obtained  using 
the  direct  QoS  relaxation  algorithm  in  Table  I  (pminycrj  =  1, 
\/i)  with  all  three  randomization  options  (randA,  randB,  and 
randC)  employed  in  parallel,  for  a  fixed  number  of  1000  ran¬ 
domization  samples  each.  Table  IV  summarizes  results  for  the 
same  scenario,  except  that  30  NM  randomization  samples  are 
drawn  for  each  randomization  strategy — thus  the  number  of  ran¬ 
domizations  grows  linearly  in  the  problem  size.  Note  that,  in 
many  cases,  our  solutions  are  within  3-4  dB  from  the  (gener¬ 
ally  optimistic)  lower  bound  on  transmitted  power  provided  by 
SDR,  and  thus  are  guaranteed  to  be  at  most  3-4  dB  away  from 
optimal;  this  is  often  good  enough  from  an  engineering  perspec¬ 
tive.  Comparing  the  corresponding  entries  in  Tables  III  and  IV, 
it  is  evident  that  switching  from  1000  to  30  NM  randomiza¬ 
tions  per  channel  realization  only  yields  a  minor  performance 
improvement  in  the  cases  considered. 

Table  V  summarizes  our  simulation  results  for  max-min  fair 
beamforming,  using  the  direct  algorithm  in  Table  II  (cr?  =  1, 
Vi,  P  =  1).  Table  V  presents  Monte  Carlo  averages  for  the 
upper  bound  on  the  minimum  SNR  (the  optimum  attained 


TABLE  V 

MC  Simulation  Results  for  Max-Min  Fair  Beamforming:  Averages 
for  the  Upper  Bound  on  min,  SNR;,  the  Relaxation- Attained 
min,  SNR,  ,  THE  min,  SNR,  ATTAINED  BY  MAXIMIZING  AVERAGE 
SNR  (Across  Users),  and  the  min,  SNR,  for  the  Case  of  no 
Beamforming.  The  Results  Are  Averaged  Over  1000 
Monte  Carlo  (MC)  Runs.  For  Each  MC  Run,  the 
Elements  of  h;  Are  Independently  Redrawn  From 
a  Circularly  Symmetric  Complex  Gaussian 
Distribution  of  Variance  1.  of  =  1,  Vi,  P  =  1. 

All  Three  Randomization  Techniques  (randA, 
randB,  randc)  ARE  USED  IN  PARALLEL,  FOR 
30  NM  Randomizations  Each 


N/M 

upper  bound 

SDR 

Max  Avg  SNR 

no  BMP 

4/8 

1.05 

0.94 

0.25 

0.12 

4/16 

0.73 

0.51 

0.11 

0.06 

8/16 

1.43 

0.86 

0.14 

0.06 

8/32 

1.07 

0.45 

0.06 

0.03 

in  problem  /Fr),  the  SDR-attained  minimum  SNR  (after  ran¬ 
domization),  the  minimum  SNR  attained  by  the  maximum 
average  SNR  beamformer9  [7,  ch.  5],  and  the  minimum  SNR 
for  the  case  of  no  beamforming.  For  the  latter,  we  have  used 
w  =  (l/\/]V)ljvxi>  which  fixes  transmitted  power  to  1.  Under 
the  i.i.d.  Rayleigh  fading  assumption,  this  is  equivalent  to 
selecting  an  arbitrary  transmit  antenna,  allocating  the  entire 
power  budget  to  it,  and  shutting  off  all  others.  To  see  this, 
note  that  the  sum  channel  (1/ viewed  by  any 
particular  receiver  i  will  still  be  Rayleigh,  of  the  same  variance 
as  the  elements  of  hj.  For  this  reason,  we  can  view  the  beam¬ 
forming  vector  w  =  (l/v^/V)ljvxi  as  corresponding  to  no 
beamforming  at  all.  All  three  randomization  options  (randA, 
randB,  and  randc)  were  employed  in  parallel,  for  30  NM 
samples  each.  It  is  satisfying  to  note  that  the  SDR  solution 
attains  a  significant  fraction  of  the  (possibly  unattainable)  upper 
bound.  Furthermore,  the  SDR  technique  provides  a  substantial 
improvement  in  the  average  minimum  SNR  relative  to  no 
beamforming  and  to  maximum  average  SNR  beamforming  [7, 
ch.  5].  Like  SDR,  maximum  average  SNR  beamforming  uses 
full  CSI  at  the  transmitter.  However,  it  is  generally  not  mean¬ 
ingful  to  compare  designs  produced  under  different  objectives. 
Accordingly,  the  maximum  average  SNR  beamforming  results 
in  Table  V  are  only  meant  to  convey  an  idea  of  how  much  QoS 
improvement  SDR  can  provide  over  computationally  simpler 
solutions  that  also  exploit  full  CSI. 

We  observe  from  Tables  III-V,  that  as  N  and/or  M  increase, 
the  quality  of  the  approximate  solution  drifts  away  from  the  re¬ 
spective  relaxation/duality  bound.  This  could  be  due  to  a  variety 
of  factors,  or  combination  thereof.  First,  the  relaxation  bound 
may  become  more  optimistic  at  higher  N  and/or  M — remember 
that  it  is  only  a  bound,  not  necessarily  a  tight  bound.  If  this  is 
true,  then  the  apparent  degradation  may  in  fact  be  much  milder 
in  reality.  Second,  the  number  of  randomizations  required  to  at¬ 
tain  a  quasi-optimal  solution  may  increase  faster  than  linearly 
in  the  product  NM.  Third,  the  approximation  quality  of  the 

9This  beamformer  maximizes  the  average  SNR  for  each  channel  matrix  re¬ 
alization  (Monte  Carlo  run),  where  the  average  is  taken  over  the  users.  The  re¬ 
sulting  beamforming  vector  is  also  scaled  to  unit  norm. 
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N=8-element  Tx  ULA  (d/lambda=1/2);  M=24  DNLK  users;  constraints  =  ones(M,1);  Nrand=300 


270 

Scenario:  6  clusters  of  4  users  each  @  [-51,-31,-11,11,31,51]  deg 

Fig.  1.  Broadcast  beamforming  example  using  algorithm  in  Table  I. 
Optimized  beam  pattern  for  N  =  8 -element  transmit  ULA  ( d/X  =  1/2) 
and  M  =  24  downlink  users,  in  six  clusters  of  four  users  each.  Clusters 
centered  at  [—51,-31,-11,11,31,51]°  with  extent  ±2°.  Channel  vectors 
are  Vandermonde,  of  element  modulus  1 .a2  =  <r2,  Vi,  pmin,i  =  1  /<r2, 
Vi  (here,  <r?  also  models  propagation  loss,  in  addition  to  thermal  noise). 
Symmetric  lobes  appear  due  to  the  inherent  ULA  ambiguity.  randA,  # 
post-SDR  randomizations  =  300.  In  this  case,  the  solution  is  guaranteed  to  be 
within  0.1%  of  the  optimum. 

method  per  se  may  degrade  as  the  problem  size  grows.  In  a  re¬ 
lated,  but  distinct,  problem  the  quality  of  the  SDR  approxima¬ 
tion  degrades  logarithmically  in  the  problem  size  [10]. 

B.  Far-Field  Beamforming  for  a  Uniform  Linear  Transmit 
Antenna  Array 

In  several  scenarios,  the  solutions  generated  by  the  SDR  tech¬ 
nique  are  essentially  optimal.  This  is  illustrated  in  Fig.  1,  which 
shows  the  optimized  transmit  beampattern  for  a  particular  far- 
held  multicasting  scenario  using  a  uniform  linear  antenna  array 
(ULA);  the  details  of  the  simulation  setup  are  included  in  the 
figure  caption  for  ease  of  reference. 

C.  Measured  VDSL  Channels 

In  this  section,  we  test  the  performance  of  our  algorithms 
using  measured  VDSL  channel  data  collected  by  France 
Telecom  R&D  as  part  of  the  EU-FP6  U-BROAD  project  # 
506790. 

Gigabit  VDSL  technology  for  very  short  twisted  copper 
loops  (in  the  order  of  100-500  m)  is  currently  under  devel¬ 
opment  in  the  context  of  fiber  to  the  basement  (FTTB)  or 
fiber  to  the  curb/cabinet  (FTTC)  hybrid  access  solutions.  Mul¬ 
tiple-input  multiple-output  (MIMO)  transmission  modalities 
are  an  important  component  of  gigabit  VDSL.  These  so-called 
vectoring  techniques  rely  on  transmit  precoding  and/or  mul¬ 
tiuser  detection  to  provide  reliable  communication  at  very 
high  transmission  rates  [5].  Transmit  precoding  is  particularly 
appealing  when  the  targeted  receivers  are  not  physically  co-lo- 
cated,  or  when  legacy  equipment  is  being  used  at  the  receive 


site.  In  both  cases,  multiuser  detection  is  not  feasible.  In  this 
context,  media  streaming  (e.g.,  news-feed,  pay-per-view,  or 
video-conferencing)  may  involve  multiple  recipients  in  the 
same  binder. 

Let  N  denote  the  number  of  loops  subscribing  to  a  given  mul¬ 
ticast.  With  multicarrier  transmission,  each  tone  can  be  viewed 
as  a  flat-fading  MIMO  channel  with  N  inputs  and  N  outputs, 
plus  noise  and  alien  interference.  The  diagonal  of  the  channel 
matrix  consists  of  samples  of  the  N  direct  [insertion  loss  (IL)] 
channel  frequency  responses,  while  off-diagonal  elements 
are  drawn  from  the  corresponding  FEXT  channel  frequency 
responses.  Due  to  the  noncoherent  combining  of  the  self-FEXT 
coupling  coefficients,  the  useful  signal  power  received  at  each 
output  terminal  is  reduced,  even  when  all  inputs  are  fed  with  the 
same  information-bearing  signal.  That  is,  the  equivalent  channel 
tap  at  frequency  /  is  /*,,(/)  =  %>(/)  +  E^p“xt=i  Vext (/)> 
where  denotes  the  direct  (insertion  loss)  channel,  and 

hnFFXT  (/)  denotes  a  generic  FEXT  interference  channel. 

Conceptually,  the  scenario  is  very  similar  to  the  wireless  sce¬ 
nario  considered  earlier,  but  with  two  key  differences:  now  N  = 
M,  and  the  channel  matrix  H  :=  [hi,  ho, . . . ,  liu]  is  diago¬ 
nally  dominated,  because  FEXT  coupling  is  much  weaker  than 
insertion  loss.  The  question  then  is  whether  transmit  precoding 
can  provide  a  meaningful  benefit  relative  to  simply  ignoring 
FEXT  altogether. 

We  use  IL  and  far-end  FEXT  measured  data  for  S88  cable 
comprising  14  quads,  i.e.,  28  loops.  The  length  of  the  cable  is 
300  m.  For  each  channel,  a  log-frequency  sweeping  scheme  was 
used  to  measure  the  I/Q  components  of  the  frequency  response 
from  10  kHz  to  30  MHz,  yielding  801  complex  samples  per 
channel.  Cubic  spline  complex  interpolation  was  used  to  con¬ 
vert  these  samples  to  a  linear  frequency  scale.  We  consider  17 
N  X  N  channel  matrices,  with  N  =  14,  in  the  frequency  range 
21.5  to  30  MHz.  Insertion  loss  drops  between  —40  and  —45  dB 
in  this  range  of  frequencies,  while  FEXT  coupling  is  between 
—77  and  —82  dB  in  the  mean,  with  over  10-dB  standard  de¬ 
viation  and  significant  variation  across  frequency  as  well.  For 
each  channel  matrix,  we  apply  our  max-min-fair  beamforming 
algorithm  with  o2  =  a2,  Vi,  and  P/c t2  =  1.  Fig.  2  shows  the 
resulting  plots  of  minimum  received  signal  power,  the  associ¬ 
ated  relaxation/duality  bound,  and  the  minimum  received  signal 
power  when  no  precoding  is  used.  We  observe  that  SDR  can 
almost  double  the  minimum  received  signal  power  relative  to 
no  precoding,  and  it  often  attains  zero  gap  relative  to  the  relax¬ 
ation/duality  bound.  For  shorter  loops  (e.g.,  100  m),  the  situation 
is  even  more  in  favor  of  SDR,  because  then  FEXT  resembles 
near-end  crosstalk  (NEXT)  and  is  relatively  more  pronounced. 

D.  Further  Observations 

1 )  Comparison  of  the  Two  Relaxations:  We  have  shown 
theoretically  that  the  two  problem  formulations  (QoS,  Q,  and 
max-min-fair,  T)  are  algorithmically  equivalent,  i.e.,  had  we 
had  an  optimal  algorithm  that  provides  the  exact  solution  to 
one,  it  could  have  also  been  used  to  obtain  the  exact  solution  to 
the  other.  What  we  have  instead  is  two  generally  approximate 
algorithms,  obtained  by  direct  relaxation  of  the  respective 
problems.  The  link  between  the  two  formulations  can  still 
be  exploited.  For  example,  we  may  obtain  an  approximate 
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Fig.  2.  Transmit  precoding  for  VDSL  multicasting. 


H  =  randn(4,16)+j*randn(4,16); 


MC  trial  (total=300) 


Fig.  3.  Comparison  of  direct  and  indirect  solutions  to  the  max-min-fair 
problem. 


solution  to  the  max-min-fair  problem  by  first  running  the  QoS 
algorithm  in  Table  I  with  all  the  (>unn,r  =  1,  then  scaling  the 
resulting  solution  to  the  desired  power  level  P.  Of  course,  we 
can  also  use  the  direct  relaxation  of  the  max-min-fair  problem 
in  Table  II.  Due  to  approximation,  there  is  no  a  priori  reason  to 
expect  that  the  two  solutions  will  be  identical,  even  in  the  mean. 

In  order  to  address  this  issue,  we  have  compared  the  two 
strategies  by  means  of  Monte  Carlo  simulation.  We  chose  N  = 
4,  M  =  16,  of  =  1,  Vi,  and  P  =  1,  and  ran  both  algo¬ 
rithms  for  300  i.i.d.  Rayleigh  fading  channels.  All  three  random¬ 
izations  (randA,  randB,  randC)  were  employed  in  parallel, 
for  30  NM  randomization  samples  each.  For  each  channel,  we 
recorded  the  percent  gap  (100  times  the  gap  over  the  relaxation 
bound)  of  each  algorithm.  Fig.  3  shows  a  portion  of  the  results, 
along  with  the  mean  percent  gap  attained  by  each  algorithm  (av¬ 
eraged  over  all  300  channels).  By  “direct”  we  refer  to  the  algo¬ 
rithm  in  Table  II,  whereas  by  “indirect,  ”  we  refer  to  the  algo¬ 
rithm  in  Table  I  with  all  pm;n.;  =  1,  followed  by  scaling. 


0  50  100  150  200  250  300 

MC  trial 


Fig.  4.  Percent  gap  outcomes  for  300  real  Gaussian  channel  realizations. 


We  observe  that  the  mean  percent  gaps  of  the  two  algorithms 
are  virtually  identical,  and  in  fact  most  of  the  respective  per¬ 
cent  gaps  are  very  close  on  a  sample-by-sample  basis.  How¬ 
ever,  there  are  instances  wherein  each  algorithm  is  significantly 
better  than  the  other  (over  10%  difference  in  the  gap).  Two  pro¬ 
nounced  cases  are  highlighted  by  arrows  in  Fig.  3.  We  conclude 
that,  while  both  approaches  are  equally  effective  on  average,  it 
pays  to  use  both,  if  possible,  in  certain  cases. 

2 )  On  the  Dependence  of  Gap  Statistics  on  Channel  Statis¬ 
tics:  We  have  seen  that,  for  i.i.d.  circular  Gaussian  (Rayleigh) 
channel  matrices,  the  gap  between  our  relaxation-randomiza¬ 
tion  approximate  solutions  and  the  relaxation/duality  bound 
might  not  be  insignificant.  We  have  also  seen  cases  wherein 
the  gap  is  very  small,  cf.,  the  far-field  uniform  linear  transmit 
antenna  array  example,  and  a  good  proportion  of  the  VDSL 
channels  tested  earlier. 

It  is  evident  that  the  gap  statistics  depend  on  the  channel  sta¬ 
tistics.  Interestingly,  the  gap  statistics  are  far  more  favorable  for 
real  (as  opposed  to  complex  circular)  i.i.d.  Gaussian  channels. 
This  is  illustrated  in  Fig.  4,  using  the  QoS  algorithm  in  Table  I 
for  N  =  4,  M  =  8,  Pininyuf  =  1,  Vi,  and  300  real  i.i.d. 
Gaussian  channels.  All  three  randomizations  (randA,  randB, 
randC)  are  employed  in  parallel,  for  30  NM  randomization 
samples  each.  For  each  channel,  we  recorded  the  percent  gap 
(100  times  the  gap  over  the  relaxation  bound)  of  the  algorithm 
in  Table  I.  Observe  that  for  about  95%  of  the  channels  the  per¬ 
cent  gap  is  down  to  numerical  accuracy  in  this  case.  Contrast 
this  situation  with  Fig.  5,  which  shows  the  respective  results  for 
complex  circular  Gaussian  channel  matrices — the  difference  is 
remarkable. 

There  are  other  cases  where  we  have  observed  that  the  relax¬ 
ation  approach  operates  close  to  zero  gap.  One  somewhat  con¬ 
trived  case  is  when  the  real  and  imaginary  parts  of  the  channel 
coefficients  are  nonnegative.  This  is  illustrated  in  Fig.  6,  where  it 
is  worth  noting  that  the  scaling  of  the  y  axis  is  10-8.  In  this  case, 
the  gap  hovers  around  numerical  accuracy,  without  exhibiting 
any  bad  runs  at  all  for  the  300  channel  matrices  considered. 
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H  =  randn(4,8)+j*randn(4,8); 


Fig.  5.  Percent  gap  outcomes  for  300  complex  circular  Gaussian  (Rayleigh) 
channel  realizations. 


afforded  by  Lagrangian  duality  theory.  In  view  of  i)  our  ex¬ 
tensive  numerical  experiments  with  simulated  and  measured 
data,  verifying  that  semidefinite  relaxation  consistently  yields 
good  performance,  ii)  proof  that  the  basic  problem  is  NP-hard, 
and  thus  approximation  is  unavoidable,  and  iii)  corroborating 
motivation  provided  by  duality  theory,  we  conclude  that  the 
approximate  solutions  provided  herein  offer  useful  designs 
across  a  broad  range  of  applications. 

It  would  be  useful  to  analyze  the  duality  gap  for  the  problem 
at  hand,  for  this  would  yield  a  priori  bounds  on  the  degree  of 
suboptimality  introduced  by  relaxation,  as  opposed  to  the  a  pos¬ 
teriori  bound  that  we  now  have  by  virtue  of  Theorem  1 .  Our  nu¬ 
merical  results  indicate  that  the  degree  of  suboptimality  is  often 
acceptable  in  our  intended  applications.  In  an  effort  to  under¬ 
stand  the  apparent  success  of  the  SDR  approach  (e.g.,  in  the 
case  where  the  channel  vectors  have  nonnegative  real  and  imag¬ 
inary  parts),  one  can  consider  the  following  simple  linearly  con¬ 
strained  convex  quadratic  program  (QP)  restriction  of  the  QoS 
problem: 


Fig.  6.  Percent  gap  outcomes  for  300  channel  realizations  with  positive  real 
and  imaginary  parts  (uniformly  distributed  between  0  and  1 ).  Note  that  the 
scaling  of  the  y  axis  is  10-8. 

In  conclusion,  the  complex  circular  Gaussian  channel  case 
appears  to  be  the  least  favorable  of  the  scenarios  considered. 

IX.  Conclusion 

We  have  taken  a  new  look  at  the  broadcasting/multicasting 
problem  when  channel  state  information  is  available  at  the 
transmitter.  We  have  proposed  two  pertinent  problem  formula¬ 
tions:  minimizing  transmitted  power  under  multiple  minimum 
received  power  constraints,  and  maximizing  the  minimum 
received  power  subject  to  a  bound  on  the  transmitted  power.  We 
have  shown  that  both  formulations  are  NP-hard  optimization 
problems;  however,  their  solution  can  often  be  well  approxi¬ 
mated  using  semidefinite  relaxation  tools.  We  have  explored 
the  relationship  between  the  two  formulations  and  also  insights 


Q,: 

mjn  ||w||! 

wecN 

subject  to:  Re  j  hfw  j  >  1,  for  all  i. 


Notice  that  the  feasible  region  of  this  problem  is  a  subset  of  that 
of  the  original  nonconvex  (and  NP-hard)  QoS  formulation  Q. 
Thus,  P*  <  P,  where  P*  and  P  denote  the  minimum  beam¬ 
forming  power  obtained  from  optimal  solutions  of  Q  and  Qs, 
respectively.  We  have  recently  shown  [16]  that  the  gap  between 
P*  and  P  is  never  more  than  1  /  cos2  (a/ 2) ,  where  a  is  the  max¬ 
imum  phase  spread  across  the  different  users  measured  at  each 
transmit  antenna  and  is  assumed  to  be  less  than  7r.  Notice  that 
the  two  cases  where  channel  vectors  i)  are  real  and  nonnegative 
or  ii)  have  nonnegative  real  and  imaginary  parts  correspond  to 
a  =  0  and  a  <  it /2.  Thus,  Qs  provides  an  exact  solution  in 
the  first  case  and  a  factor  of  2  approximation  in  the  second  case. 
These  results  indicate  that  problem  Q  is  well  approximated  by 
Qs  if  the  phase  spread  a  is  small. 

There  are  many  other  interesting  extensions  to  the  algo¬ 
rithms  developed  herein:  e.g.,  robustness  issues,  and  multiple 
cochannel  multicasting  groups.  These  are  subjects  of  ongoing 
work  and  will  be  reported  elsewhere.  Furthermore,  aside 
from  transmit  beamforming/precoding,  there  are  also  more 
traditional  signal  processing  applications  of  the  proposed 
methodology.  One  is  linear  filter  design,  in  particular,  the  de¬ 
sign  of  a  linear  “batch”  filter  that  responds  to  certain  prescribed 
frequencies  in  its  input  and  attenuates  all  other  frequencies.  In 
this  setting,  the  h,  vectors  will  be  Vandermonde,  with  genera¬ 
tors  e ?Ui  and  u>i  £  [— tt,  it).  One  may  easily  envision  scenarios 
wherein  such  a  problem  formulation  can  be  appropriate: 
radio-astronomy  applications,  frequency-diversity  combining, 
and  frequency-hopping  communications.  The  context  can 
be  further  generalized:  design  a  linear  filter  that  responds  to 
prescribed  but  otherwise  arbitrary  signals  in  its  input,  while 
attenuating  all  else. 
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Appendix  I 
Proof  of  Claim  1 

Before  dealing  with  Claim  1  directly,  we  first  consider  the 
following  restriction  of  the  QoS  problem  Q:  the  case  when  all 
h.j  are  real,  and  optimization  is  over  IRA  .  We  will  show  that10 

EE 

min  xrx 

xGRw 

subject  to:  |xrh„,  |  >  1,  m  £  {1,  •  •  • ,  M} 

contains 

A~: 

(  N 

min  y\  -\ - V  y%  +  E  an'!M 

Un  \7l — 1 

subject  to:  yl>  1,  n<E  {!,-■■  ,N}  | 

as  a  special  case  and  that  problem  A  is  at  least  as  hard  as  the 
following  problem: 

Partition  Problem  II:  Given  integers  a,\ ,  •  •  ■ ,  a  jy,  do 
there  exist  binary  variables  {x„}«=i  e  {+1,-1}^,  such  that 
En= 1  anXn  =  0? 

This  is  known  to  be  NP-complete  [4], 

It  is  easy  to  check  that  the  optimal  value  of  problem  A  is  equal 
to  N  if  and  only  if  the  answer  to  problem  II  is  affirmative.  Thus, 
solving  problem  A  is  at  least  as  hard  as  solving  problem  II. 

To  show  that  problem  S  contains  problem  A  (i.e.,  an  arbi¬ 
trary  instance  of  problem  A  can  be  posed  as  a  special  instance 
of  problem  S),  note  that  y\  >  1  can  be  written  as  |yTe„|  >  1, 
where  y  :=  [j/lf  •  •  • ,  y^]T  and  en  contains  one  in  the  nth  posi¬ 
tion  and  zeros  elsewhere.  Furthermore 

yl  +  ■  ■  ■  +  vn  +  (^2  any*)  =  yT(i + aaT)y  =  yTQw 


where  a  :=  [ap,  •  •  • ,  and  Q  :=  I  +  aar.  The  matrix 
Q  is  positive  definite.  Let  Q  =  STS,  and  x  :=  Sy.  Then 
yrQy  =  xrx,  y  =  S-1x,  and  |yTe„|  >  1  can  be  written 
as  |xTS-re„,|  >  1,  or,  with  h„  :=  S_Te„,  as  |xrh„|  > 
1.  This  shows  that  an  arbitrary  instance  of  problem  A  can  be 
transformed  to  a  special  instance  of  problem  S  (with  M  =  N). 
Thus,  S  is  at  least  as  hard  as  A,  which  is  at  least  as  hard  as  the 
partition  problem.  □ 

Proof  of  Claim  1:  The  QoS  Problem  Q  is  NP-hard:  Con¬ 
sider  the  problem 


and  H  is  full  row-rank  ( N ).  Then  wH  =  z^H1,  where  H1  = 
denotes  the  right  pseudoinverse  of  H,  and  the 
problem  in  ( 1 )  is  equivalent  to 


min  z/;Qz 

zecM 

subject  to:  \z^\  >  1,  k  =  1 .....  M  (2) 

where  Q  :=  H+(H+)H  y  0,  a  positive  semidefmite  matrix  of 
rank  N  <  M ;  and  zi-  denotes  the  /,:th  element  of  the  vector  z. 
We  will  show  that  problem  (2)  is  NP-hard  in  general.  To  this 
end,  we  consider  a  reduction  from  the  NP-complete  partition 
problem  [4];  i.e.,  given  a\  >  0, a2  >  0 ,...,ap  >  0,  decide 
whether  or  not  a  subset,  say  I,  of  P}  exists,  such  that 


^2ak  —  2  y^Ak- 

kel  k=l 


(3) 


Let  M  =  2 P  +  1  and  let  the  complex-valued  decision  vector  be 


z  =  [z0izu . .  .,zp,zp+ 1, . . .  ,z2p]T  G  CM. 


Let  us  denote 
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where  1  p  denotes  the  length-P  vector  of  ones,  and  Op  is  the 
length-P  vector  of  zeros. 

Next  we  show  that  a  partition  I  satisfying  (3)  exists  if  and 
only  if  the  optimization  problem  (2)  has  a  minimum  value  of 
M.  In  other  words,  the  existence  of  I  is  equivalent  to  the  fact 
that  there  is  z  £  CM  such  that  zHQz  =  M  and  \zy,\  >  1,  for 
all  k.  Since 

2  p 

zHQz=  ||Az|||  +  ^2  Nfc|2 >2P  +  1  =  M,  for  \zk\ >  1  Vfc, 
k= 0 


it  follows  that 

zhQz  =  M,  \z]~\>\  for  all  k 
is  equivalent  to 


min  wHw 

wecw 

subject  to:  |wHhj|  >  1, 


i  =  M. 


(1) 


Az  =  0,  \zk\  =  1  for  all  k. 
The  latter  gives  rise  to  a  set  of  linear  equations 


Define  the  N  X  M  matrix  H  =  [hp,  •  •  • ,  hjvr],  and  the  M  X  1 
vector  z,  with  zH  :=  w^H.  Consider  the  case  that  M  >  N, 

10We  henceforth  use  hm  to  denote  possibly  scaled  channel  vectors,  dropping 
the  tilde  for  brevity. 
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The  Zk  s  are  all  constrained  to  be  on  the  unit  circle;  thus  let 
Zk/zo  =  for  k  =  1, . . . ,  2 P.  Using  (4),  we  have 


cos  Ok  +  cos  6 p+k  =1  (6) 

sin  Ok  +  sin  0 p+k  =0  (7) 


where  k  =  1  These  two  equations  imply  that  Ok  G 

{— 7r/3,7r/3}  for  all  k.  This,  in  particular,  means  that  cos  Ok  = 
cos  Op+k  =  1/2  for  k  =  1 ,P,  implying  that 


Re 


1 

2 


p 


Q'kZk 

Zo 
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Therefore,  (5)  is  satisfied  if  and  only  if 
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O'kZk 

ZO 
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[9]  W.-K.  Ma,  T.  N.  Davidson,  K.  M.  Wong,  Z.-Q.  Luo,  and  P.-C.  Ching, 
“Quasi-ML  multiuser  detection  using  semi-definite  relaxation  with  ap¬ 
plication  to  synchronous  CDMA,”  IEEE  Trans.  Signal  Process.,  vol.  50, 
no.  4,  pp.  912-922,  Apr.  2002. 

[10]  A.  Nemirovski,  C.  Roos,  and  T.  Terlaky,  “On  maximization  of  quadratic 
form  over  intersection  of  ellipsoids  with  common  center,”  Math.  Pro¬ 
gram.,  ser.  A,  vol.  86,  pp.  463^173,  1999. 

[11]  J.  F.  Sturm,  “Using  SeDuMi  1.02,  a  MATLAB  toolbox  for  optimization 
over  symmetric  cones,”  Optim.  Meth.  Softxv.,  vol.  11-12,  pp.  625-653, 
1999. 

[12]  P.  Tseng,  “Further  results  on  approximating  nonconvex  quadratic  opti¬ 
mization  by  semidefinite  programming  relaxation,”  SIAM  J.  Optim.,  vol. 
14,  no.  1,  pp.  268-283,  Jul.  2003. 

[13]  S.  A.  Vorobyov,  A.  B.  Gershman,  and  Z.-Q.  Luo,  “Robust  adaptive 
beamforming  using  worst-case  performance  optimization:  A  solution 
to  the  signal  mismatch  problem,”  IEEE  Trans.  Signal  Process.,  vol.  51, 
no.  2,  pp.  313-324,  Feb.  2003. 

[14]  H.  Wolkowicz,  “Relaxations  of  Q2P,”  in  Handbook  of  Semidefinite  Pro¬ 
gramming:  Theory,  Algorithms,  and  Applications,  H.  Wolkowicz,  R. 
Saigal,  and  L.  Vandenberghe,  Eds.  Norwell,  MA:  Kluwer,  2000,  ch. 
13.4. 

[15]  S.  Zhang,  “Quadratic  maximization  and  semidefinite  relaxation,”  Math. 
Program.,  ser.  A,  vol.  87,  pp.  453^-65,  2000. 

[16]  Z.-Q.  Luo,  N.  D.  Sidiropoulos,  P.  Tseng,  and  S.  Zhang,  “Approximation 
bounds  for  quadratic  optimization  with  homogeneous  quadratic  con¬ 
straints,”  SIAM  J.  Optim.,  Oct.  2005,  submitted  for  publication. 


with  Ok  G  { — 7t/3,  7t/3}  for  all  k,  and  thus  sin  Ok  G 
{(v/3/2),  —  (v/3/2)},  which  is  equivalent  to  the  existence 
of  a  partition  I  of  {aq , . . . ,  a,p}  such  that  (3)  holds.  In  fact,  we 
can  imply  take  I  =  {k\0k  =  7r/3}.  □ 
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A  Hybrid  Probabilistic  Data 
Association-Sphere  Decoding  Detector  for 
Multiple-Input-Multiple-Output  Systems 

Georgios  Latsoudas  and  Nicholas  D.  Sidiropoulos,  Senior  Member,  IEEE 


Abstract — A  hybrid  probabilistic  data  association  (PDA)-sphere 
decoding  (SD)  algorithm  is  proposed  for  signal  detection  in  mul- 
tiple-input-multiple-output  (MIMO)  systems.  The  key  idea  is  to  re¬ 
duce  the  dimension  of  the  problem  solved  via  SD  by  first  running  a 
single  stage  of  the  PDA  to  fix  symbols  that  can  be  decoded  with  high 
reliability.  Simulations  under  a  multiple  antenna  Rayleigh  fading 
scenario  show  that  this  two-step  algorithm  attains  a  considerably 
better  performance-complexity  tradeoff  than  SD  and  PDA  for  low 
to  moderate  signal-to-noise  ratio  (SNR)  or  higher  problem  dimen¬ 
sions. 

Index  Terms — Integer  least  squares,  multiple-input-mul- 
tiple-output  (MIMO)  decoding,  probabilistic  data  association 
(PDA),  sphere  decoder. 


I.  Introduction 

MULTIPLE  antenna  systems  have  attracted  great  interest 
in  recent  years,  due  to  the  capacity  improvement  that 
they  afford.  Vertical  Bell  Laboratories  Layered  Space  Time 
(V-BLAST)  [3]  is  a  widely  known  multiple  antenna  spatial 
multiplexing  system  targeting  high  spectral  efficiencies.  Un¬ 
fortunately,  the  associated  maximum-likelihood  (ML)  detector 
amounts  to  a  constrained  integer  least-squares  problem,  whose 
exact  solution  entails  exhaustive  search.  Thus,  following  the 
so-called  nulling  and  cancelling  detector  [3],  several  computa¬ 
tionally  efficient  detection  algorithms  have  been  developed  for 
or  adapted  to  V-BLAST. 

Sphere  Decoding  (SD)  [11],  Probabilistic  Data  Association 
(PDA)  [8],  [10],  and  Semi-Definite  Relaxation  (SDR)  [9] 
are  three  multiple-input-multiple-output  (MIMO)  detectors 
that  can  provide  near-optimal  performance  at  relatively  low 
complexity  in  certain  scenarios.  Among  them,  SD  appears  to 
be  prevalent  in  the  recent  literature.  Numerous  variants  and 
improvements  of  SD  have  recently  been  developed,  e.g.,  [1], 
[2],  [12],  and  [13],  incorporating  more  sophisticated  schemes 
for  increasing  the  associated  search  radius  and  organizing  the 
computations  in  a  more  efficient  manner,  e.g.,  the  Schnorr-Eu- 
chner  (SE)  SD,  which  uses  an  improved  search  strategy  [1], 
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[2],  A  drawback  of  the  SD  family  of  detectors  is  that,  for 
close-to-ML  performance,  complexity  remains  high  in  the  low 
signal-to-noise  ratio  (SNR)  regime  or  when  the  number  of 
symbols  to  be  jointly  detected  is  large  [5],  [6]. 

The  PDA  is  a  simpler  detection  method,  which,  however,  gen¬ 
erally  provides  worse  performance  than  SD.  SD,  PDA,  SDR, 
and  several  other  algorithms  have  recently  been  compared  in 
the  context  of  code  division  multiple  access  (CDMA)  multiuser 
detection  [4],  A  corresponding  comparison  for  the  multiple  an¬ 
tenna  Rayleigh  fading  scenario  (as  in  V-BLAST)  has  not  been 
undertaken,  to  the  best  of  our  knowledge.  Thorough  compar¬ 
isons  are  nontrivial,  because  complexity  and  performance  of  SD 
and  SDR  depend  on  a  number  of  parameters.  Our  experience  in 
[7]  indicates  that  SDR  is  inferior  to  SD  at  high  SNR. 

In  this  letter,  we  propose  a  hybrid  PDA-SD  algorithm  that  at¬ 
tains  a  better  performance-complexity  tradeoff  than  either  of  its 
constituent  components.  At  each  stage  of  the  decoding  process, 
the  PDA  produces  a  set  of  soft  decision  metrics  that  can  be  used 
to  assess  how  reliable  associated  hard  decisions  would  be  at 
that  point.  The  basic  idea,  then,  is  to  execute  a  single  stage  of 
the  PDA  algorithm  and  fix  those  symbols  that  can  be  detected 
with  high  reliability.  After  cancelling  the  effect  of  those  sym¬ 
bols,  a  reduced-dimensionality  problem  is  passed  to  SD  for  de¬ 
coding.  This  reduces  the  complexity  of  SD  and  improves  the 
performance  of  PDA.  Our  simulations  show  that  the  proposed 
algorithm  enjoys  an  error  performance  close  to  that  of  SD  over 
a  wide  range  of  SNR,  at  a  significantly  reduced  computational 
cost. 

We  use  the  SD  algorithm  in  Viterbo-Boutros  (VB-SD)  [11], 
with  an  initial  radius  chosen  according  to  [5],  and  the  SE-SD  in 
[1]  and  [2],  with  a  search  radius  set  to  infinity.  Note,  however, 
that  the  initial  PDA  stage  can  also  be  combined  with  other  vari¬ 
ants  of  SD  or  SDR.  The  key  here  is  that  dimensionality  reduc¬ 
tion  via  single-stage  PDA  preprocessing  can  provide  significant 
computational  relief  at  a  small  performance  cost. 

II.  System  Model 

The  aforementioned  techniques  are  applicable  to  a  broad 
range  of  MIMO  communication  systems.  Herein,  we  focus  on 
V-BLAST  for  concreteness.  V-BLAST  is  a  symbol  synchro¬ 
nized  multiple  antenna  system  with  rir  transmit  and  hr  receive 
antennas,  with  nr  <  ur-  The  input  stream  of  bits  is  mapped 
to  a  particular  constellation,  and  the  resulting  symbol  stream 
is  demultiplexed  into  hr  substreams.  The  transmissions  are 
organized  into  bursts  of  L  symbol  periods.  It  is  assumed  that 
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the  channel  is  frequency  flat  and  block  fading  (i.e.,  its  variation 
is  negligible  over  the  L  symbol  periods  comprising  a  burst  and 
random  from  one  burst  to  the  next).  The  channel  is  assumed  to 
be  known  to  the  receiver  but  not  to  the  transmitter.  From  the 
discrete-time  baseband-equivalent  viewpoint,  the  system  can 
be  represented  as 


r  =  ,/ — As- 

tlt 


n  =  Hs  +  n 


(1) 


that  is,  set  ,sy  =  sign(p(i)  —  0.5),  Vi  6  D  and  collect  these  de¬ 
cisions  in  a  vector  s/>  Now,  expand  (6)  as 


r=[HD  HS] 


S  D 
SD 


+  n 


with  obvious  notation.  Assuming  perfect  decisions  for  the  bits  in 
D  (that  is,  Sd  =  Sjj),  the  residual  subsystem  after  cancellation 


where  f  =  [ri,r2, . . .  ,fnR]T,s  =  [%,  s2,  v  snT]T  are  the re¬ 
ceive  and  the  transmit  vector,  respectively,  A  is  a  generally  com¬ 
plex  ur  x  tit  channel  matrix  with  entries  dij,  and  n  is  a  white 
Gaussian  circularly  symmetric  tir  x  1  noise  vector  with  covari¬ 
ance  matrix  2<t2I.  The  normalized  amplitude  (p/nr)  ensures 
that  the  SNR  is  constant  for  a  given  noise  variance,  irrespective 
of  tit ■  Assuming  rich  scattering,  the  elements  of  A  are  mod¬ 
eled  as  independent  and  identically  distributed  (i.i.d.)  circularly 
symmetric  Gaussian  variables  with  zero  mean  and  unit  variance 
of  the  real  and  imaginary  parts.  For  simplicity,  we  assume  that 
the  transmitted  symbols  are  taken  from  a  4-QAM  constellation, 
but  the  ideas  generalize  to  higher  order  constellations.  In  order 
to  transform  the  above  model  to  a  real-valued  one,  define 


s  :=  [)ft(sT) 

3(sT)f 

(2) 

r:=[ 

K{fT} 

3{rT}f 

(3) 

A  — 

"K{A} 

-A{A}1 

(4) 

_r\_  . — 

A{A} 

K{A} 

n  :=  [ 

S{nT} 

A{AT}f 

(5) 

where  3?,  ?s  denote  the  real  and  the  imaginary  part,  respectively. 
Using  the  above  vectors  and  matrices,  we  obtain  the  real-valued 
vector  equation 


r  =  t  /  — As  +  n  =  Hs  +  n. 

rix 


(6) 


III.  Hybrid  Algorithm 

The  hybrid  algorithm  consists  of  the  following  steps.  As  in 
[10],  we  premultiply  (6)  with  HT,  which  yields 


z  =  HTr=Gs  +  v  (7) 

where  G  :=  HTH  is  a  symmetric  positive  definite1  matrix,  and 
v  =  HTn  is  a  noise  vector  with  covariance  matrix  <r2G.  We 
then  apply  one  stage  of  the  PDA  detector  (steps  1-5  in  [8])  to 
the  system  in  (7)  and,  thus,  obtain  a  vector  p  that  contains  the 
associated  probabilities  for  the  elements  of  s.  Let  D  denote  the 
subset  of  bits  that  satisfy 

p(i)  G  [0,r]  U  [1 -r,  1]  (8) 

with  r  to  be  suitably  chosen.  D  will  henceforth  denote  the  com¬ 
plement  of  D.  We  then  make  hard  decisions  for  the  bits  in  D, 


'With  probability  1,  under  the  i.i.d.  Rayleigh  assumption. 


yp  :=  r  —  UDsD  =  UDsD  +  n. 


After  compacting 

yc  :=  H^yp  =  H^Hgs^  +  H^n  =  +  vp 


the  noise  vector  vjj  is  colored  Gaussian  with  zero  mean  and  co- 
variance  matrix  a2Gjyfj.  Introduce  the  Cholesky  factorization 


^dd ~ ^dd^dd 


(9) 


and  premultiply  the  system  with  L — to  obtain 


x  :=  L—Ay,.  =  L , 

DDJC  1 


W 


(10) 


where  the  noise  vector  w  is  white  Gaussian  with  covariance 
matrix  <r2I.  We  now  apply  SD  to  (10).  Let  K  be  the  number  of 
elements  in  D.  As  suggested  in  [5],  the  initial  radius  for  VB-SD 
is  set  to  C  =  aKcr2,  with  a  such  that 


raI</2  (K/  2-1) 


IW  2) 


■e~xdx  =  0.99. 


(ID 


Alternatively,  SE-SD  can  be  used  in  the  second  stage  of  the  hy¬ 
brid  algorithm.  We  try  both  VB-SD  and  SE-SD  in  our  simula¬ 
tions. 


Threshold  Parameter 

The  threshold  parameter  r  should  be  small  enough  to  ensure 
that  the  PDA  stage  makes  reliable  decisions.  On  the  other  hand, 
r  should  not  be  too  small,  for  otherwise,  the  inclusion  of  the 
PDA  stage  will  yield  little  if  any  dimensionality  reduction  ben¬ 
efit. 

While  it  is  clear  that  r  should  be  made  smaller  with  increasing 
SNR,  choosing  it  based  on  analytical  considerations  appears  in¬ 
tractable.  Our  experience  is  that  the  following  choice  is  rea¬ 
sonable:  r  =  10_?l  [hard-limited  within  (0,0.45]],  with  p  := 
3.5((8 <72) / (p))-1'55-  This  setting  is  well  supported  by  our  sim¬ 
ulation  results,  which  are  reported  next. 

IV.  Simulation  Results 

In  our  simulations,  each  burst  comprises  L  =  100  symbol 
intervals.  Over  each  symbol  interval,  %  4-QAM  symbols 
(±(i/\/2)  ±  X i/y/2))  are  simultaneously  transmitted.  For 
each  burst,  a  new  realization  of  the  Rayleigh  channel  matrix  is 
generated.  For  the  bit-error  rate  (BER)  plots,  we  use  a  dynamic 
Monte  Carlo  simulation:  For  each  SNR,  the  simulation  stops 
when  both  the  number  of  errors  has  reached  150,  and  the 
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Fig.  1.  Probability  of  error  comparison  for  4-QAM  with  nT  =  nR  =  1G. 
Dynamic  Monte  Carlo  simulation. 


SNR 

Fig.  2.  Computational  cost  versus  SNR,  nT  =  nR  =  1 G,  4-QAM.  104  Monte 
Carlo  runs. 

number  of  bursts  has  reached  five.  This  ensures  sufficient  aver¬ 
aging  in  the  low  error  rate  regime  while  reducing  unnecessarily 
long  runs  in  the  high  error  rate  regime.  For  the  computational 
complexity  plots,  we  use  104  (100  bursts  of  100  symbol  vectors 
each)  Monte  Carlo  runs  per  datum  reported. 

The  implementation  of  PDA  does  not  incorporate  the  bit-flip 
stage  [8],  The  internal  threshold  parameter  of  PDA  is  set  to 
e  =  10_2/(4SNR)  as  in  [8]  (note  that  this  is  different  from  our 
hard  decoding  threshold  t).  The  initial  radius  of  SD  is  set  as  in 
Section  III;  if  SD  fails  to  find  a  point  inside  the  sphere,  the  ra¬ 
dius  is  increased  by  one,  up  to  five  times  (six  searches  at  most). 
For  the  SE-SD  algorithm,  we  set  the  search  radius  to  infinity, 
which  ensures  that  the  ML  solution  will  be  found. 

Fig.  1  shows  the  BER  performance  of  PDA,  SD, 
and  the  hybrid  PDA-SD  algorithm  as  a  function  of 
SNR  :=  101og10(/9/cr2),  for  nr  =  Ur  =  16.  Fig.  2 
shows  the  associated  average  and  worst-case  computational 


Number  of  transmit  antennas  Number  of  transmit  antennas 


Fig.  3.  Computational  cost  versus  nT,  nT  =  nR,  4-QAM,  SNR  =  10  dB, 
104  Monte  Carlo  runs. 

costs  per  symbol  vector,  measured  in  Floating  Point  Opera¬ 
tions  (FLOPS).  Finally,  Fig.  3  shows  FLOPS  versus  nr,  with 
nT  =  ur,  for  SNR  =  10  dB. 

V.  Conclusion 

We  have  presented  a  two-stage  hybrid  PDA-SD  algorithm  for 
signal  detection  in  MIMO  systems.  The  basic  idea  is  dimension¬ 
ality  reduction  via  hard  decoding  and  cancellation  of  those  sym¬ 
bols  that  can  be  quickly  and  reliably  detected  via  a  single  PDA 
stage.  In  the  V-BLAST  scenario  considered,  simulations  show 
that  the  proposed  hybrid  algorithm  attains  performance  close  to 
SD,  at  a  complexity  close  to  PDA.  The  dimensionality  reduc¬ 
tion  idea  can  also  be  applied  in  conjunction  with  other  variants 
of  SD  or  SDR. 
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Abstract — This  paper  considers  the  problem  of  simultaneous 
multiuser  downlink  beamforming.  The  idea  is  to  employ  a  transmit 
antenna  array  to  create  multiple  “beams”  directed  toward  the 
individual  users,  and  the  aim  is  to  increase  throughput,  measured 
by  sum  capacity.  In  particular,  we  are  interested  in  the  practically 
important  case  of  more  users  than  transmit  antennas,  which 
requires  user  selection.  Optimal  solutions  to  this  problem  can 
be  prohibitively  complex  for  online  implementation  at  the  base 
station  and  entail  so-called  Dirty  Paper  (DP)  precoding  for  known 
interference.  Suboptimal  solutions  capitalize  on  multiuser  (selec¬ 
tion)  diversity  to  achieve  a  significant  fraction  of  sum  capacity  at 
lower  complexity  cost.  We  analyze  the  throughput  performance 
in  Rayleigh  fading  of  a  suboptimal  greedy  DP-based  scheme 
proposed  by  Tu  and  Blum.  We  also  propose  another  user-se¬ 
lection  method  of  the  same  computational  complexity  based  on 
simple  zero-forcing  beamforming.  Our  results  indicate  that  the 
proposed  method  attains  a  significant  fraction  of  sum  capacity 
and  throughput  of  Tu  and  Blum’s  scheme  and,  thus,  offers  an 
attractive  alternative  to  DP-based  schemes. 

Index  Terms — Beamforming,  downlink,  multiuser  diversity. 


I.  Introduction 

TRANSMIT  antenna  arrays  can  be  utilized  in  two  basic 
ways  or  a  combination  thereof:  space-time  coding  and  spa¬ 
tial  multiplexing.  The  former  can  be  used  without  Channel  State 
Information  (CSI)  at  the  transmitter  and  allows  mitigation  of 
fading  and  exploitation  of  transmit-receive  diversity.  However, 
if  CSI  is  known  at  the  transmitter,  higher  throughput  can  be  at¬ 
tained  using  spatial  multiplexing,  which  can  be  implemented 
as  multibeam  transmit  beamforming.  Until  recently,  transmit 
beamforming  was  mostly  considered  for  voice  services  in  the 
context  of  the  cellular  downlink.  With  the  emergence  of  third- 
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and  fourth-generation  (3G  and  4G)  systems,  higher  emphasis  is 
being  placed  on  packet  data,  which  are  more  delay-tolerant  but 
require  much  higher  throughput.  Hence,  we  have  the  recent  in¬ 
terest  in  transmit  beamforming  strategies  for  the  cellular  down¬ 
link  that  aim  to  attain  the  sum  capacity  of  the  wireless  channel 
[1],  [11],  [13]-[16],  [18],  [19], 

The  scenario  of  interest  can  be  modeled  as  a  nondegraded 
Gaussian  broadcast  channel  (GBC).  Let  N  be  the  number  of 
antennas  at  the  transmitter  [Base  Station  (BS)  in  a  cellular 
context],  and  consider  a  cluster  of  M  mobile  users,  each 
equipped  with  a  single  receive  antenna.  The  channel  between 
each  transmit  and  receive  antenna  is  constant  over  a  certain 
time  interval  and  is  known  at  the  BS.  The  received  signal  is 
corrupted  by  Additive  White  Gaussian  Noise  (AWGN)  that  is 
independent  across  users.  The  BS  may  transmit  simultaneously, 
using  multiple  transmit  beams,  to  more  than  one  user  in  the 
cluster. 

Since  the  receivers  cannot  cooperate,  successful  transmission 
critically  depends  on  the  transmitter’s  ability  to  simultaneously 
send  independent  signals  with  as  small  interference  between 
them  as  possible.  Caire  and  Shamai  [1]  proposed  a  multiplexing 
technique  based  on  coding  for  known  interference,  known  as 
“Writing  on  Dirty  Paper,”  Costa  precoding  [2],  or  dirty  paper 
(DP)  coding.  In  [2],  it  is  proven  that  in  an  AWGN  channel  with 
additional  additive  Gaussian  interference,  which  is  known  at  the 
transmitter  in  advance  (noncausally),  it  is  possible  to  achieve  the 
same  capacity  as  if  there  were  no  interference.  Assuming  Costa 
precoding  and  known  channels  at  the  transmitter,  Vishwanath 
et  al.  [14]  and  Yu  and  Cioffi  [19]  have  proposed  algorithms  that 
evaluate  sum  capacity  of  the  GBC  along  with  the  associated  op¬ 
timal  signal  covariance  matrix.  However,  both  approaches  re¬ 
quire  convex  optimization  in  (order  of)  MN  variables  to  find 
the  optimal  signal  covariance  matrix.  Jindal  et  al.  [7]  have  re¬ 
cently  proposed  a  more  efficient  iterative  algorithm,  which  re¬ 
quires  0(M2N2)  operations  per  iteration. 

The  complexity  of  the  aforementioned  optimal  strategies 
can  be  problematic  for  online  implementation,  especially  when 
M  is  large.  A  reduced-complexity  suboptimal  solution  to  sum 
rate  maximization  is  proposed  in  [1].  It  suggests  the  use  of  QR 
decomposition  of  the  channel  matrix  combined  with  DP  coding 
at  the  transmitter.  The  combined  approach  nulls  interference 
between  data  streams,  and  hence,  it  is  named  zero-forcing 
dirty-paper  (ZF-DP)  precoding.  If  N  >  M,  ZF-DP  is  proven  to 
be  asymptotically  optimal  at  both  low  and  high  SNR  but  subop¬ 
timal  in  general,  whereas  ZF  beamforming  without  DP  coding 
is  optimal  in  the  low  SNR  regime  and  yields  the  same  slope  of 
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throughput  versus  SNR  in  decibels  as  the  sum  capacity  curve 
at  high  SNR.  For  the  case  of  N  >  M,  Spencer  and  Haardt 
[11]  considered  ZF  beamforming  without  DP  coding,  and 
Samardzija  and  Mandayam  [10]  compared  ZF  beamforming 
with  QR-decomposition-based  spatial  prefiltering  coupled  with 
DP  coding. 

If  N  <  M,  [  1  ]  has  shown  that  random  selection  of  U  <  N 
users  incurs  significant  throughput  loss  for  both  ZF-DP  and 
ZF  schemes.  Tu  and  Blum  [13]  have  proposed  an  algorithm 
based  on  ZF-DP,  with  a  greedy  user-selection  procedure,  named 
greedy  ZF-DP  (gZF-DP).  In  [13],  it  is  shown  by  simulations  that 
the  throughput  of  gZF-DP  is  a  significant  fraction  of  the  sum  ca¬ 
pacity.  This  is  achieved  by  means  of  multiuser  diversity.  For  the 
case  of  N  <  M,  Viswanathan  etal.  [16]  considered  the  problem 
of  achieving  any  point  in  the  capacity  region  and  not  only  max¬ 
imum  sum  capacity.  They  proposed  ZF  beamforming  coupled 
with  a  user-selection  scheme  that  schedules  A  users  using  an 
exhaustive  search  over  a  set  of  Kt  users  with  the  highest  indi¬ 
vidual  SINR  (A  <  Kt  <  M).  The  throughput  of  this  scheme 
was  compared  to  the  throughput  of  a  DP-coding-based  optimal 
algorithm,  and  it  was  reported  that  as  Kt  approaches  M,  the 
throughput  of  ZF  with  exhaustive  user  selection  comes  close  to 
the  throughput  of  the  optimal  algorithm  when  each  receiver  has 
one  antenna  [16]. 

An  important  shortcoming  of  DP  coding  is  that  it  requires 
vector  coding,  and  depending  on  the  SNR,  it  may  require  long 
temporal  block  lengths  to  be  well  approximated  in  practice.  In 
particular,  the  required  block  length  decreases  as  SNR  increases, 
with  a  block  length  of  one  being  adequate  at  sufficiently  high 
SNR.  At  low  and  moderate  SNR,  a  good  approximation  of  DP 
can  be  computationally  demanding  with  the  current  state-of-art 
[8],  [18],  [20].  For  this  reason,  we  advocate  herein  a  more  prag¬ 
matic  approach,  based  on  plain  ZF  beamforming. 

Our  goal  is  to  investigate  low-complexity  downlink  beam¬ 
forming  solutions  that  come  close  to  attaining  sum  capacity  for 
the  practically  important  case  wherein  the  number  of  down¬ 
link  users  (M)  is  larger  than  the  number  of  transmit  antennas 
( N ),  which  entails  user  selection.  Our  aim  is  three-fold:  i)  An¬ 
alyze  gZF-DP  to  better  understand  the  effects  of  multiuser  di¬ 
versity;  ii)  propose  a  simpler  greedy  alternative,  based  on  ZF 
beamforming  and  dubbed  ZFS,  which  does  not  use  DP  coding; 
and  iii)  assess  the  performance  of  both  gZF-DP  and  ZFS  rela¬ 
tive  to  sum  capacity.  The  key  idea  is  that  multiuser  diversity  can 
largely  make  up  for  the  use  of  simple  linear  processing  in  lieu  of 
more  complex  schemes.  The  performance  analysis  of  gZF-DP 
is  useful  in  system  design,  and  ZFS  is  appealing  from  a  prac¬ 
tical  standpoint.  In  particular,  we  will  show  that  the  complexity 
of  the  selection  procedure  of  the  proposed  algorithm  is  the  same 
as  that  of  gZF-DP.  Our  simulation  results  indicate  that  at  mod¬ 
erate  and  high  SNR,  ZFS  has  equal  slope  of  throughput  versus 
SNR  as  the  gZF-DP  and  the  capacity  curve.  It  achieves  a  sig¬ 
nificant  fraction  of  throughput  of  the  gZF-DP  algorithm  and  re¬ 
mains  close  to  sum  capacity  for  all  SNR  for  a  small  to  moderate 
number  of  transmit  antennas. 

We  note  that  an  inherent  drawback  of  the  maximum  sum  ca¬ 
pacity  criterion  is  the  lack  of  fairness  guarantees,  at  least  in  the 
short  run.  While  this  could  be  compensated  over  a  longer  time¬ 


line  due  to  channel  variations,  it  remains  that  certain  users  may 
be  completely  shut  off  during  a  scheduling  epoch.  Whether  this 
is  appropriate  or  not  depends  on  the  context;  on  this  issue,  see 
also  [1],  [11],  [13]— [16],  [18],  and  [19]. 

The  rest  of  the  paper  is  organized  as  follows.  The  problem 
of  sum  rate  maximization  is  formulated  in  Section  II.  This  is 
followed  by  a  review  of  the  gZF-DP  algorithm,  a  description 
of  the  proposed  ZFS  algorithm,  and  a  comparison  of  the  com¬ 
plexities  of  the  two  algorithms  in  Section  III.  In  Section  IV,  the 
throughput  performance  of  the  gZF-DP  algorithm  in  indepen¬ 
dent  Rayleigh  fading  is  analyzed.  Simulation-based  comparison 
of  the  throughput  performances  of  gZF-DP  and  ZFS  is  provided 
in  Section  V.  Conclusions  are  drawn  in  Section  VI. 

II.  Problem  Formulation 

Let  hm,.n  model  the  quasistatic,  flat-fading  channel  between 
transmit  antenna  n  and  the  receive  antenna  of  user  m,  and  de¬ 
note  hm  :=  [hm,i  hm^  ■■■  fhn.N]  -  Note  that  hm  is  a  row 
vector.  Thus,  the  channel  matrix  H  is 

H=[h{  h*2  ...  h*Mr  a) 

where  (•)*  denotes  conjugate-transpose.  rank(H)  = 
min  (A,  M)  with  probability  1,  due  to  the  assumed  statis¬ 
tical  independence  and  continuous  distribution  of  the  channel 
vectors.  Throughout  the  paper,  we  are  interested  in  the  case 
N  <  M  so  that  we  assume  that  rank(H.)  =  N.  Collecting  the 
baseband-equivalent  outputs,  the  received  signal  vector  is 

x  =  Hy  +  z  (2) 

where  y  is  the  transmitted  signal  vector,  and  z  is  the  noise 
vector.  The  signal  covariance  matrix  is  Cy  =  E[yyH].  The 
total  transmit  power  is  constrained  to  P.  The  sum  capacity  of 
such  a  vector  Gaussian  broadcast  channel  is  [15] 

C=  sup  logdet(I  +  HCyH*)  (3) 

Cy  €lA 

where  A  is  the  set  of  A  by  A  non-negative  diagonal  matrices 
Cy  with  Trace[Cy]  <  P. 

Using  only  linear  spatial  processing  at  the  transmitter,  which 
is  a  suboptimal  strategy,  we  obtain  the  following  model.  Let 
Wm  =  [w\.m  w2,m.  ■  ■  ■  Wiv,m]r  ((-)T  denotes  transpose)  be 
the  beamforming  weight  vector  for  user  m.  The  beamforming 
weight  matrix  W  is 

W  =  [wi  W2  •  •  •  wm].  (4) 

Collecting  the  baseband-equivalent  outputs,  the  received  signal 
vector  is 

x  =  HWDs  +  z  (5) 

where  s  is  the  transmitted  signal  vector  containing  uncorrelated 
unit-power  entries,  and 

I"  s/pi  0  •••  0  1 


L  0  o  ••• 

accounts  for  power  loading  (the  columns  of  W  are  thus  normal¬ 
ized  to  unit  norm).  Note  that  the  elements  of  x  are  physically 
distributed  across  the  M  mobile  terminals.  Multiuser  decoding 
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is  therefore  not  feasible;  hence,  each  user  treats  the  signals  in¬ 
tended  for  other  users  as  interference.  Noise  is  assumed  to  be 
circular  complex  Gaussian,  zero-mean,  and  uncorrelated  with 
variance  of  each  complex  entry  <j2  =  1. 

The  desired  signal  power  received  by  user  to  is  given  by 
|hmwm|2pm.  The  Signal-to-Interference  plus  Noise  Ratio 
(SINR)  of  user  m  is 


SINRm 


|hmWm  |  ~'Pm 

E  \hn,yVi\2Pi  +  (T2' 
i^m 


(7) 


The  linear  beamforming  problem  can  now  be  formulated  as 

M 

max  V  log9(l  +  SINRm) 

W.D„tT| 

subject  to|| WD | ||i  <  P 


where  ||  •  ||f  denotes  Frobenius  norm,  and  P  stands  for  a  bound 
on  average  transmitted  power. 

Attaining  capacity  requires  Gaussian  signaling  and  long 
codes,  yet  the  logarithmic  SINR  reward  can  be  motivated  from 
other,  more  practical  perspectives  as  well.  It  can  be  shown  that 
it  measures  the  throughput  of  QAM-modulated  systems  over 
both  AWGN  and  Rayleigh  fading  channels.  The  intuition  is  that 
SINR  improvements  eventually  yield  diminishing  throughput 
returns. 


III.  Reduced-Complexity  Algorithms 
A.  Greedy  Zero-Forcing  Dirty-Paper  Algorithm 

In  [1],  Caire  and  Shamai  have  proposed  a  suboptimal  solution 
to  (3)  based  on  the  QR-type  decomposition  [6]  of  the  channel 
matrix  H  =  LQ  obtained  by  applying  Gram-Schmidt  orthogo- 
nalization  to  the  rows  of  H.  L  is  a  lower  triangular  matrix,  and 
Q  has  orthonormal  rows.  Setting  W  =  Q*,  (5)  yields  a  set  of 
interference  channels 


•t’rn  —  lm,m'JPrnsm  T  ^  ]  Z m,j\/PjSj  T  zmi  m  —  1,  .  .  ..5  TV 
j<m 

(9) 

while  no  information  is  sent  to  users  m  =  TV  +  1, . . . ,  M. 
In  order  to  eliminate  the  interference  term  Im  = 
Y,j<mlm,j^/PjSj,  the  input  signals  i/p^sm,  for 
m  =  1 , ,7V  are  obtained  by  successive  application  of  DP 
coding,  where  for  each  m,  the  interference  Im  is  noncausally 
known.  This  particular  choice  of  precoding  matrix  W  =  Q* 
nulls  interference  caused  by  users  j  >  m  and  DP  coding  nulls 
interference  caused  by  users  j  <  to  so  that  the  scheme  forces 
all  interference  to  zero.  Hence,  it  was  dubbed  ZF-DP  coding. 
The  throughput  of  the  ZF-DP  scheme  is  given  by  [  1] 

N 

Rzfdp  =  ^[l°g2(M„)]+  (10) 

m= 1 

where  [.'/;]  +  =  max{0,  x},  dn  :=  |Z„in|2,  and  p  is  the  solution 
of  the  water-filling  equation 


m= 1 


(in 


Then,  for  m 


1, ---,7V 


Pm  —  dn 


(12) 


Note  that  when  TV  <  M,  one  has  to  select  up  to  TV  out  of 
M  users  whose  data  will  be  transmitted.  In  general,  different 
selections  yield  different  values  of  i?zfdp  in  (10).  Furthermore, 
different  ordering  within  the  same  set  of  users  yields  different 
sum  rate.  The  ZF-DP  scheme  does  not  attempt  to  optimize  the 
throughput  with  respect  to  either  user  selection  or  ordering.  In 
[13],  Tu  and  Blum  have  proposed  a  greedy  algorithm  for  the 
selection  of  TV  out  of  M  rows  of  the  channel  matrix  H  and  or¬ 
dering  of  the  selected  rows  in  the  Gram-Schmidt  orthogonal- 
ization,  aiming  to  maximize  the  throughput.  The  algorithm  is 
called  greedy  ZF-DP  and  is  presented  here  for  convenience. 

Let  U  =  {1,  2,  ...,  M}  denote  the  set  of  indices  of  all 
M  users,  and  let  Sn  =  {si, . . . ,  s„}  C  U  denote  the  set  of  n 
selected  users  ( |  Sn  \  =  n). 


1)  Initialization: 

•  Set  n  =  1 . 

•  Let  =  h.uh*  .  Find  a  user  si  such  that 

si  =  arg  Tnax„er/  n  . 

•  Set  Si  =  {si} . 

2  )  While  n  <  TV : 

•  Increase  n  by  1 . 

•  Project  each  remaining  channel  vector 
onto  the  orthogonal  complement  of  the  sub¬ 
space  spanned  by  the  channels  of  the  se¬ 
lected  users.  The  projector  matrix  is 

=  I n-  H(Sn_i)*  (H(5'n_1)H(5'n_1)*)_1  H(Sn„i) 

(13) 

where  Ijv  is  the  TV  X  TV  identity  matrix,  and 
H  (Sn  _i)  denotes  the  row-reduced  channel 
matrix  consisting  of  the  channel  vectors 
of  the  users  selected  in  the  first  n  —  1 

steps 


H(5„_i)  =  [h*  hs*2  ...  h^_J*. 

(14) 

Let  rnjU  =  |huPj^|  .  Due  to  idempotence  of 
,  we  have 

r  —  r,  p-Lv,* 

•  Find  a  user  sn  such  that 

(15) 

sn  =  arg  max  r„,u. 
llCU\St:  1 

(16) 

•  Set  Sn  =  Sn-i  U  K}  . 

3)  Beamforming:  Let  W  =  Q* ,  where  H  (SN)  = 
LQ  is  the  QR-type  decomposition  of  H  (SN) . 

4)  DP  coding:  Applied  to  the  rows  of  L. 

Power  Loading:  Water- filling. 


The  rows  qm  of  Q  in  the  QR  decomposition  of  )  = 
L(5,„)Q(S'„)  are  obtained  by  applying  Gram-Schmidt  orthog- 
onalization  to  the  ordered  rows  of  H(<S'n):  hSl , . . . .  h.Sr. .  This 
yields  [1] 

h*„<fjqj-  (17) 

jesn_  i 
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From  L(S„)  =  H(S„)Q(S„)*,  we  obtain  /„,„  =  hs„q*n.  By 
definition  of  dn  (10),  orthonormality  of  Q (Sn),  and  (17),  we 
have 

(7—1/  I2  —  h  P^h* 

From  (15)  and  (16),  it  follows  that 

dn  =  max  rn.u  (18) 

forn  =  1, . . . ,  TV.  In  other  words,  the  gZF-DP  algorithm  maxi¬ 
mizes  dn,  conditioned  on  the  choice  of  . . . ,  dn~  \ . 

B.  ZF  With  User  Selection 

ZF  beamforming  inverts  the  channel  matrix  at  the  transmitter 
so  that  orthogonal  channels  between  transmitter  and  receivers 
are  created.  It  is  then  possible  to  encode  users  individually,  as 
opposed  to  the  more  complex  long-block- vector  coding  gener¬ 
ally  needed  to  implement  DP.  Note  that  ZF  at  the  transmitter 
does  not  enhance  noise  at  the  receiver,  but  it  incurs  an  excess 
transmission  power  penalty  relative  to  ZF-DP.  If  M  <  TV,  and 
rank(H.)  =  M,  then  the  ZF  beamforming  matrix  is 

W  =  H*(HH*)-1  (19) 

which  is  the  Moore-Penrose  pseudoinverse  of  the  channel  ma¬ 
trix.  However,  if  M  >  TV,  it  is  not  possible  to  use  (19)  because 
HH*  is  singular.  In  that  case,  one  needs  to  select  n  <  N  out  of 
M  users. 

For  M  >  TV,  the  problem  (8)  is  reformulated  as  follows: 
Given  H  £  qMxN ,  seiecj  n  <  N  and  a  set  of  channels 
{hsi , ....  h,n  },  which  produce  the  row-reduced  channel  ma¬ 
trix 

H(5„)  -  \ht  h*2  ...  h*J* 

such  that  the  sum  rate  is  the  highest  achievable: 

max  max  R  ~  f  ( Sn ) 
i<n<N  sn  n  J 

subject  to  ^  //, - =  P.  (20) 

iesn  L  GPnJJ  + 

The  throughput  of  ZF  algorithm  is  given  by  [1] 

Rzf(Sn)  =  ^  [log2(MCiOS’«))]+  (21) 

i€Sn 

where 

g(5„)  =  {[(H^HO^)*)-1]^}-1  (22> 

and  //,  is  obtained  by  solving  the  water-filling  equation  in  (20). 
The  power-loading  then  yields 

Pi  =  Ci(Sn )  p - ,  V  i  £  Sn.  (23) 

)  J  _j_ 

The  problem  can  be  conceptually  solved  by  exhaustive  search: 
For  each  value  of  n,  find  all  possible  n- tuples  Sn  and  select  a 
pair  ( n,S„ ),  which  yields  maximum  Rzf(Sn).  However,  such 
an  algorithm  has  prohibitive  complexity. 

We  propose  a  reduced-complexity  suboptimal  algorithm, 
dubbed  ZF  with  Selection  (ZFS),  as  outlined  next. 


1)  Initialization: 

•  Set  n  —  1  . 

•  Find  a  user,  ,  such  that 

si  =  argTna,x„er/h.„h*. 

•  Set  Si  =  {si}  and  denote  the  achieved 
rate  Rzf(Si)max  • 

2 )  While  n  <  N : 

•  Increase  n  by  1. 

•  Find  a  user,  sn ,  such  that 

sn  =  argmaxtie[/\Si!_1  Rzf(Sn-i  U  {n}). 

•  Set  Sn  =  Sn—i  U  {sn},  and  denote  the 
achieved  rate  Rzf(Sn)max. 

•  If  Rzf(Sn)max  —  Rzf(Sn— i)max  break ,  and  de¬ 
crease  n  by  1 

3  )  Beamforming:  W  =  H(S’„)*(H(S’„)H(S’n)*)-1 

Power  Loading:  Water-filling. 

C.  Complexity  and  Implementation 

We  consider  complexity  of  the  user  selection  procedure  only. 
The  complexity  of  DP  coding,  required  by  the  gZF-DP  algo¬ 
rithm,  depends  on  its  implementation,  in  particular,  the  degree  of 
approximation  and  the  associated  spatio-temporal  block  length 
(which  is  a  function  of  SNR),  cf.  [4],  [18]. 

Complexity  of  the  user  selection  procedure  of  the  gZF-DP 
algorithm  is  0(NsM).  To  see  this,  note  that  for  each  n  <  N, 
the  algorithm  evaluates  M  —  n  +  1  2-norms  rTM, .  Evaluation  of 
rn  u  involves  a  vector-matrix  multiplication,  where  the  vector 
is  1  x  TV  and  the  matrix  TV  X  TV.  The  complexity  of  this  step  is 
0(TV2).  Repeating  this  over  0(M )  users  in  TV  steps,  we  obtain 
0(N3M). 

We  will  show  that  the  complexity  of  the  user  selection  proce¬ 
dure  of  the  ZFS  algorithm  is  also  0(TV3M) .  Again,  for  each  n  < 
TV,  the  ZFS  algorithm  evaluates  M  —  n  +  1  rates  Rzf(Sn-i  U 
{ti}).  The  evaluation  of  Rzf(Sn- 1  U  {u})  is  split  into  the  eval¬ 
uation  of  the  Ci(Sn- 1  U  followed  by  evaluation  of 

cf.  (21).  An  efficient  way  to  evaluate  the  a{Sn- 1  U  {w})’s 
is  by  using  the  matrix  inversion  lemma  to  invert  the  matrix 
A  (Sn-!  U  M)  :=  H(Sn_!  U  M)H(Sa_!  U  {«})*.  Note 
that 

A(v1uW)=[A(J-l)  „a" 

where  au  =  [hSlh*,  hS2h*,  ...  hs„_1h*]T,  and  au,u  = 
h^h*.  Noting  that  A(S'n_i)*  =  A(S'n_i)  and  writing 

q  =  AiSn-i)-1^  (24) 

after  some  algebraic  manipulation,  we  obtain 

Msn-1  U  M)-1  =  [  A(o|“11)_1  V 

+  (au,u  -  <q)_1  ^  (25) 

where  0u_x  =  [0  0  ...  0]lx(„  _  i  y  It  can  be  verified  that 

each  time  n  is  increased,  A(S'n_i)_1  and  a.,;  ,,,  %  £  Sn- 2  are 
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known  before  the  search  over  u  £  U  \  Sn-i  starts.  Hence,  eval¬ 
uation  of  A(Sn-i  U  {u,})-1  from  (24)  and  (25)  has  complexity 
proportional  to  0(n2).  Repeating  this  over  0(M )  users  in  each 
of  n  <  N  steps,  we  obtain  the  overall  complexity  of  the  user-se¬ 
lection  procedure  of  the  ZFS  algorithm  to  be  0(N3M). 

It  can  be  shown  that  the  per-iteration  complexity  of  the  sum 
power  iterative  water-filling  algorithm  proposed  by  Jindal  et  al. 
[7]  is  0(N2M2).  Therefore,  the  gZF-DP  and  ZFS  algorithms 
have  significantly  lower  computational  complexity  than  the  sum 
power  iterative  water-filling  algorithm  if  M  >>  N. 

In  the  following,  we  pay  attention  to  the  substeps  in  step  2) 
of  the  ZFS  algorithm.  Given  a  set  Sn ,  we  have  [1] 

fti(Sn)  =  \haiP(Sn\{si})±\2  (26) 

where  P(,Sj,  )x  denotes  the  projector  onto  the  orthogonal  com¬ 
plement  of  Q(Sn)  =  span{hSi  :  si  £  Sn}.  Note  that  cj(Sn- i  U 
{w,})  <  cj (5„ _  i )  for  every  user  j  £  Sn- 1-  This  is  due  to  (26) 
and  Sn- 1  C  Sn— i  U  {u}.  Therefore,  if  (20)  and  (23)  yield 
pu  =  0,  then  Rzf(Sn- 1  U  {ti})  <  Rzf(Sn- 1).  We  discard 
such  u.  We  also  discard  u  if  (20)  and  (23)  yield  ps.  =  0  for 
some  Si  £  Sn- 1 .  This  is  done  to  keep  complexity  at  bay  for  oth¬ 
erwise,  combinatorial  search  might  effectively  emerge.  Hence, 
user  u  is  a  candidate  for  Sn  if  pi  >  0,  V  i  £  Sn- 1  U  {rt}.  From 
the  properties  of  water-filling,  this  holds  if 

_ — _  <  P  +  _ - _  (27) 

where  c,;mi..(5„_i  U  {«.})  =  minvC  ,sr„  (U{„}  c-i{Sn-\  U  {«.}). 
Then,  we  have 


Y 


.i&Sn-lU{u} 


1 

Ci(Sn-i  U  {u}) 


(28) 


If  (27)  is  not  satisfied,  we  skip  to  the  next  u. 

We  note  that  the  break  in  Step  2  is  necessary  when  ZFS  is 
used  but  redundant  when  ZF-DP  is  used;  it  is  shown  in  [  1]  and 
[13]  that  in  the  latter  case,  maximum  sum  rate  can  always  be 
achieved  with  N  active  users  if  P  >  0  [1],  On  the  other  hand, 
when  ZF  alone  is  used,  the  optimum  number  of  active  users  is 
'nopt  <  N  and  decreases  as  P  decreases,  so  that  for  P  —?  0, 
the  ZF  scheme  reduces  to  maximum  ratio  combining  (MRC) 
nopt  =  1  [1],  This  also  holds  for  the  proposed  ZFS  algorithm, 
which  follows  from  the  water-filling  equation  in  (20)  and  the 
fact  that  Ci(S’i)  =  maxi^u  a.,;.,;. 


As  noted  earlier,  the  simple  ZF-DP  and  ZF  algorithms  in  [1] 
do  not  attempt  to  optimize  throughput  with  respect  to  user  se¬ 
lection  and  ordering  when  M  >  N.  Instead,  users  are  selected 
and  ordered  randomly. 


A.  gZF-DP  Sum  Rate  Under  Long-Term  Power  Constraint 

We  model  the  greedy  ZF-DP  algorithm  [13]  under  a  long¬ 
term  power  constraint.  We  are  interested  in  evaluating 


PgZF-DP  =  E 


"  N 

^[log(M0di)]+ 

_i= 1 


(29) 


where  /;,,,  is  the  solution  of  the  water-filling  equation,  stemming 
from  the  long-term  (LT)  power  constraint 


E 


Li=l 


(30) 


Note  that  the  optimum  ji.0  determined  by  (30)  will  be  a  deter¬ 
ministic  function  of  the  statistics  of  the  df  s  and  not  a  function 
of  the  random  variables  themselves.  By  this  and  linearity  of  ex¬ 
pectation,  we  can  rewrite  (29)  as 


PgZF-DP 


N 


^2E[log(p0di)\+ 

i= 1 

N  /*  oo 

V]  /  [log (n0x)\+fdi(x)dx. 

i=  i 


Therefore 


N  poo 

RgZF-DP  =  y ^  /  log(fl0x)fdi(x)dx  (31) 

i= 1  ‘'l/ Mo 

where  fd .  (rc)  denotes  the  probability  density  function  (pdf)  of 
d{.  Similarly,  (30)  becomes 

/  poo  poo  -t  \ 

y2\P  fdi(x)dx  -  /  -fdi(x)dx  =  P.  (32) 
i= i  \  E/u  J i//t  x  J 

In  order  to  evaluate  R,  we  need  to  evaluate  the  pdfs  of  df  s 
based  on  the  knowledge  of  channel  statistics  and  selection  pro¬ 
cedure.  Our  derivation  below  draws  in  part  from  performance 
analysis  tools  in  [5],  [17],  which  we  tailor  to  fit  the  context  of 
gZF-DP.  In  particular,  our  analysis  accounts  for  and  exploits  the 
specific  selection  procedure  employed  in  gZF-DP. 


IV.  Performance  Analysis  in  Independent 
Rayleigh  Fading 

In  this  section,  we  evaluate  the  throughput  of  the  greedy 
ZF-DP  algorithm  [13]  in  independent  Rayleigh  fading  when 
channels  remain  constant  over  the  duration  of  a  transmission  of 
a  block  of  symbols.  The  channels  of  all  M  users  are  assumed  to 
have  i.i.d.  entries,  which  are  circularly  symmetric,  zero-mean, 
complex  Gaussian  random  variables  (r.v.s)  with  unit  variance 
hm,n  ~  CJf{ 0, 1).  In  [1],  the  average  throughput  of  the  ZF-DP 
and  ZF  schemes  in  independent  Rayleigh  fading  under  a 
long-term  power  constraint  for  general  N  and  M  is  evaluated. 


B.  Probability  Density  Functions 

It  is  instructive  to  consider  the  modeling  of  the  pdf  of  d  \  first, 
followed  by  modeling  the  pdf  of  do,  and  then  generalizing  to 
compute  the  pdf  of  dn  for  general  n  <  N .  First,  let  us  determine 
the  distribution  of  ri  u  =  h„h* .  Note  that  ri  u  is  a  sum  of  N 
squared  magnitudes  of  circularly  symmetric,  zero-mean,  unit- 
variance  complex  Gaussian  random  variables.  Therefore,  it  has 
Chi-squared  distribution  with  2 N  degrees  of  freedom  (rgu  ~ 
X2n)’  whose  pdf  is 

fri(xi)  =  _1  exp(-ati).  (33) 
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T(N)  denotes  the  Gamma  function,  and  T(N)  =  (N  —  1)!  for 
a  positive  integer  N.  According  to  the  selection  algorithm 


<k  =  maxri  u.  (34) 

ueu 

From  order  statistics,  e.g.,  [3,  (2.1.1)],  we  obtain  the  pdf  of  di 
as 


fdl(Xl)  =  M[Fri(x1)]M~1  fri(Xl)  (35) 


where  Fn  (x-\ )  is  the  cumulative  distribution  function  (cdf)  of 
r’i  We  say  that  the  distribution  of  r‘i  /u  is  the  parent  distribu¬ 
tion  of  the  order  statistics  >  ri;(2)  >  ■  •  ■  >  rq where 
r\  m  is  the  ith  largest  njU  for  u  G  U. 

Noting  that  riu  <  d\,  for  all  of  the  remaining  users  (u  G 
U  \  Si),  it  follows  that  the  posterior  distribution  of  riu  of  the 
remaining  users  (after  selecting  user  si)  depends  on  the  real¬ 
ization  of  di .  In  the  sequel,  we  will  need  to  use  the  conditional 
pdf  of  ri  of  the  remaining  users  given  a  realization  of  d\.  Ac¬ 
cording  to  (34)  and,  e.g.,  [3,  Th.  2.7],  the  parent  distribution  of 
the  order  statistics  of  the  remaining  users  u  G  U  \  Si  is  equal 
to  fri(x i)  truncated  on  the  right  at  the  value  of  di 


^(n)’  G  [O.vyi]  (36) 
0,  otherwise. 


After  setting  n  =  2,  the  selection  algorithm  proceeds  by  pro¬ 
jecting  the  channel  vectors  of  all  of  the  remaining  users  onto  the 
orthogonal  complement  of  the  subspace  spanned  by  the  channel 
vector  of  user  sp.  From  (15),  we  have  r2yU  =  h„P.^h*,  for 
u  G  U  \  Si,  where  P^-  is  given  in  (13).  The  distribution  of 
r2,u  given  d,\ ,  which  is  denoted  fr,2\d-,  (x2\y\  ),  then  becomes 
the  parent  distribution  of  the  order  statistics  given  d\  for 
i  >  2.  Therefore,  we  need  a  mapping  from  fri\dt(xi\yt)  to 
fr;  \di  (tzt 2 1 2/1 )  that  models  the  projection  step 

/•OO 

(*t*2  |//l  )  -  /  fr2\r1,d1(x2\v,yi)fri\d1(V\yi)dv-  (37) 

JO 

Here,  fr.,\n.d1(x2\v,yi)  denotes  the  pdf  of  r2}U,  given  realiza¬ 
tions  of  ri  u  and  d,\ .  Note  that  r2}U  <  fi.u  <  di.  hSl  is  statisti¬ 
cally  independent  of  hM,  for  u  G  U\S\,  so  that  from  the  point  of 
view  of  the  users  in  U\Si ,  P2  appears  to  be  a  randomly  selected 
projector  matrix.  However,  the  first  user  has  been  selected  after 
considering  the  channels  of  all  users,  and  thus,  there  might  be 
mild  dependence  between  the  channels  of  the  remaining  users 
in  U  \  Si  and  P^-.  For  analytical  tractability,  we  will  ignore 
this  dependence.  Our  simulation  results  will  fully  corroborate 
this  approximation:  The  difference  is  not  even  noticeable  in 
simulations. 

Assumption  1:  We  therefore  assume  that  d.\  conveys  no  in¬ 
formation  about  Pj-,  i.e.,  fr2\r1,d1(x2\v,  yi)  has  the  Markovian 
property 


./'n,  (-Z'2  |  //I  )  =  ,/V2  I-,  (-/'2  1 1’)  •  (38) 


Fig.  1.  cdf  when  //  :  1  x  .Y  channel  vector. 


i.i.d.,  circularly  symmetric,  zero-mean,  complex  Gaussian  en¬ 
tries  with  unit  variance  hn  ~  CJ\f  (0,1)  and  pn  ~  CJ\f  (0,1).  Let 
Y  :=  hh*  and  X  :=  hPh*,  where  P  =  IN  -  p*  (pp*)-1  p 
[cf.  (13)]  is  an  N  X  projector  matrix  with  N  —  1  eigenvalues 
equal  to  1  and  one  eigenvalue  equal  to  0.  Then,  the  cdf  of  X, 
given  Y,  is  given  by 

Fx\Y{x\y)=[{y)N  for x  G  [0, y\  (39) 

[  0,  elsewhere. 

Remark  1:  The  rigorous  proof  of  this  claim  turned  out  to  be 
elusive,  but  it  is  very  well  supported  by  simulations.  Fig.  1  de¬ 
picts  Fx\Y{%\y)  versus  x/y  for  N  =  2,  3,  and  4.  Lines  show 
empirical  cdfs  obtained  by  Monte  Carlo  (MC)  simulations,  and 
markers  show  samples  of  analytic  curves  given  by  (39).  In  MC 
simulations,  for  each  value  of  N,  there  were  2  x  10s  random 
realizations  of  P  given  h,  for  102  realizations  of  h.  The  empir¬ 
ical  Fx\y(x\v)  is  discrete.  Its  support  x/y  G  [0, 1]  is  divided 
into  200  intervals  of  length  1/200.  The  match  in  Fig.  1  is  very 
accurate. 

From  (39),  we  obtain 


/„,>•:  0/2 1 '0 


^(f)A  2,  for  x2  G  [0,  v\,  (4Q) 

0,  otherwise. 


From  (18),  it  follows  that  do,  conditioned  on  a  realization  of 
di,  is  the  maximum  of  M  —  1  r.v.s  with  the  parent  distribution 
given  by  the  pdf  fr2\di  ix2 \x\ )  from  (37).  Using  order  statistics, 
we  obtain  [3] 


fd2\di(x2\xi)  =  (M  -  1)  [Frspt;(at2|ati)]  “  /r2|rf, {x2\xi). 

(41) 

Since  fr?\di{x2 |^i)  =  0  for  x2  >  Xi,  it  follows  that  d2  <  di. 
Finally 


fd2(x2)  = 


/  £Ci=0 


.fdo.d,  (X2\xi)fdl(xi)dxi 


(42) 


The  pdf  fro\ri(x2\v)  is  obtained  from  the  following. 

Claim  1:  Let  h  =  [hi  ...  hx]  and  p  =  [pi  ...  px]  de¬ 
note  independent  /V-dimensional  random  (row-)  vectors  with 


for  xi  >  x2. 

Armed  with  these  insights,  we  can  now  generalize  to  the  com¬ 
putation  of  the  pdf  of  dn  for  n  <  N.  The  associated  derivation 
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is  deferred  to  the  Appendix.  Using  the  results  of  Section  IV-A, 
the  pdf  of  dn  is  obtained  as  a  marginal  distribution: 


fd.n  {Xn) 


fdr,  —  (xn\xn—li  •  •  •  i  xi) 

n  —  1 

fdk \dk-l,...,d1  {xk\xk-l  ■  ixl) 

k~  2 

•  fdl(xi)dxn...dxi 


(43) 


for  xi  >  X2  >  •  •  •  xn. 

The  pdfs  of  dn  for  n  =  1, ....  A  can  be  written  in  a  more 
compact  form,  facilitating  analysis  and  numerical  integration. 
Proposition  1:  Define 


<Mxi )  =  fri(xi)  (44) 


Fig.  2.  Family  of  pdfs  of  d.n  for  Ar  =  4.  M  =  8. 


and 


$n(xm  xn—li  •  •  •  i  xl) 


.  x  rN-n 

T(N-n+l)  n 

§Mn—l  rXn—2  / 

J Vn-!=X„  J Vrl-2=Vn-i  ■Jv1=v2 

■  dv i . . .  dvn- 1. 


exp(-ui) 


(45) 


Then,  we  have  (46),  shown  at  the  bottom  of  the  page.  The  proof 
is  given  in  the  Appendix.  We  will  use  the  forms  in  the  above 
proposition  in  the  Proof  of  Theorem  1,  whose  statement  appears 
in  Section  IV-C. 

Fig.  2  depicts  an  example  of  pdfs  of  dn  for  N  =  4  and 
M  =  8.  Full  lines  depict  analytically  obtained  pdfs.  Markers 
show  samples  of  the  empirically  obtained  pdfs  through  Monte 
Carlo  (MC)  simulations.  There  are  106  MC  samples.  For  every 
dn,  the  support  of  the  empirical  pdf  is  truncated  where  the  tail 
becomes  insignificant.  Then,  the  empirical  pdf  is  discretized  by 
dividing  the  truncated  support  into  100  equal  intervals.  These 
results  justify  the  approximation  (Assumption  1)  made  in  the 
course  of  an  analytical  derivation  for  tractability  considerations. 


C.  Throughput  of  gZF-DP  at  High  SNR 

Let  RgZF-DP  denote  the  average  throughput  of  the  gZF-DP 
algorithm.  Let  p  =  10  log10  P  denote  the  SNR,  where  the  noise 
variance  of  each  user  is  assumed  equal  to  1 .  We  have  the  fol¬ 
lowing  result. 


Theorem  1:  Let  N  <  M,  and  let  P  be  the  power  limit.  Then, 
under  our  working  assumptions 


lim 

P— 400 


gZF-DP 


=  jvlog2l0 
10 


bits 

dB~ 


(47) 


The  proof  is  given  in  the  Appendix.  The  above  theorem  shows 
that  the  throughput  versus  SNR  slope  of  the  gZF-DP  algorithm 
in  the  high  SNR  regime  is  proportional  to  the  number  of  an¬ 
tennas  at  the  transmitter  N.  Note  that  this  is  the  theoretical 
limit  of  the  capacity  versus  SNR  slope  for  a  multiple-input  mul¬ 
tiple-output  (MIMO)  system  with  N  transmit  and  M  >  N  re¬ 
ceive  antennas  [9], 


V.  Comparison  of  Greedy  ZF-DP  and  ZFS 

The  throughputs  of  the  gZF-DP  and  ZFS  algorithms  are  pre¬ 
sented  in  Figs.  3  and  4.  The  y-axis  shows  sum  capacity  and 
sum  rate  in  bits  per  channel  use.  The  x-axis  shows  total  power 
P  in  decibels.  The  noise  level  of  every  user  is  1.  The  sum  ca¬ 
pacity  and  sum  rates  are  averaged  over  100  channels.  Channels 
are  complex-valued,  drawn  from  an  i.i.d.  Rayleigh  distribution 
with  unit-variance  for  each  channel  entry.  The  sum  capacity  is 
obtained  using  the  approach  proposed  in  [14], 

For  the  gZF-DP  algorithm,  analysis  (obtained  under  a  long¬ 
term  power  constraint)  yields  throughput  very  close  to  that  ob¬ 
tained  via  simulations  (under  a  short-term  power  constraint). 
This  can  be  explained  as  follows.  Capitalizing  on  multiuser  di¬ 
versity,  gZF-DP  selects  and  orders  channels  (users)  from  a  large 
pool  of  statistically  independent  candidates.  The  result  is  that 
the  ensuing  df  s  are  far  more  stable  than  they  would  have  been 


Af!  fXl  rXn-2 

fd  JXn)  ~  M  /  /  / 

V  /*  Jx  l=Xn  Jx  2=Xn  J  Xn-i=Xn 

Pxn 

/  %n— 1?  •  * 

Uy=0 

■,xi)dy 

■  JJ  <j>k(Xk,  ■  ■  -iX^dXn-i  .  .  .  dx\ . 

k=  l 


(46) 
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Fig.  3.  ZFS  versus  Greedy  ZF-DP  versus  Sum  capacity:  M  =  8  users,  N  = 
2,  and  N  =  4. 


Fig.  4.  ZFS  versus  Greedy  ZF-DP  versus  Sum  Capacity:  M  =  16  users,  N  = 
2,  and  N  =  4. 

without  user  selection  and  ordering.  This  justifies  the  use  of 
a  long-term  power  constraint  for  analysis,  as  opposed  to  the 
short-term  power  constraint  originally  proposed  in  the  algorithm 
and  used  in  simulations. 

In  these  scenarios  ( N  =  2  or  4  and  M  =  8  or  16),  both 
gZF-DP  and  ZFS  algorithms  achieve  throughput  close  to  sum 
capacity.  Note  that  ZFS  exhibits  the  same  slope  of  rate  increase 
per  decibel  of  SNR  as  the  gZF-DP  algorithm  and  the  sum  ca¬ 
pacity  curve  at  moderate  and  high  SNR. 

Fig.  5  shows  the  throughput  of  the  ZFS  algorithm  as  a  frac¬ 
tion  of  the  throughput  of  the  gZF-DP  algorithm  for  various  pairs 
N,  M  at  20  dB  SNR.  The  curves  are  obtained  by  simulations, 
averaging  over  2  x  104  channels  for  each  pair  N,  M.  For  all  N, 
M  considered,  this  fraction  stays  between  0.875  and  0.985.  For 
a  given  M,  the  gap  between  gZF-DP  and  ZFS  increases  as  N 
increases,  but  even  for  N  =  8,  the  gap  is  uniformly  less  than 


Fig.  5.  Rzfs  /  Rg zf-dp  for  various  numbers  of  antennas,  N ,  and  users,  M, 
at  20  dB  SNR. 

13%  of  the  gZF-DP  throughput.  Note  that  a  realistic  implemen¬ 
tation  of  DP  coding  will  incur  a  certain  rate  loss  for  the  gZF-DP 
algorithm,  so  that  the  gap  would  be  smaller  in  reality. 

Given  N  and  for  sufficiently  large  M,  Fig.  5  shows  that  the 
gap  between  ZFS  and  gZF-DP  decreases  with  M.  This  is  due  to 
multiuser  diversity — the  more  users  that  contend  for  transmis¬ 
sion,  the  higher  the  probability  that  N  of  them  will  be  almost 
orthogonal.  This  in  turn  reduces  the  advantage  of  DP-coding- 
based  schemes  over  ZFS.  Depending  on  N,  the  fraction  of  sum 
rate  of  ZFS  over  the  sum  rate  of  gZF-DP  may  first  exhibit  a 
dip  before  starting  to  increase  steadily  with  M.  While  the  dip  is 
small  (less  than  3%),  it  is  noticeable,  and  we  do  not  have  an  ex¬ 
planation  for  it.  We  have  observed  that,  as  SNR  increases,  more 
transmit  antennas  are  required  for  this  dip  to  occur. 

VI.  Conclusions 

We  have  considered  two  algorithms  that  capitalize  on  mul¬ 
tiuser  diversity  to  achieve  a  significant  fraction  of  the  multi- 
antenna  downlink  sum  capacity  when  the  number  of  users  M 
is  greater  than  the  number  of  antennas  N.  We  have  analyzed 
the  throughput  performance  of  the  greedy  ZF-DP  algorithm  in 
independent  Rayleigh  fading  and  characterized  the  pdfs  of  cer¬ 
tain  key  parameters  of  interest.  Determining  the  proper  number 
of  samples  required  for  accurate  Monte  Carlo  estimates  is  a  dif¬ 
ficult  issue  without  a  baseline.  While  the  end  result  of  gZF-DP 
performance  analysis  requires  sequential  numerical  integration 
and  is  admittedly  cumbersome,  it  provides  such  a  baseline  and 
thus  corroborates  the  results  of  Monte  Carlo  estimation.  In  addi¬ 
tion,  numerical  integration  is  simpler  than  Monte  Carlo  simula¬ 
tion  for  a  small  number  of  transmit  antennas.  Furthermore,  our 
analysis  allowed  us  to  establish  that  at  high  SNR,  the  throughput 
versus  SNR  slope  of  the  gZF-DP  algorithm  is  proportional  to  N. 

We  have  also  proposed  another  low-complexity  algorithm, 
dubbed  ZFS,  which  does  not  require  DP  coding  at  the  trans¬ 
mitter.  We  have  shown  that  the  selection  procedures  of  gZF-DP 
and  ZFS  algorithms  have  the  same  complexity  order  0(N3M), 
which  is  significantly  smaller  than  the  complexity  of  the  optimal 
algorithms  when  M  N.  We  have  evaluated  the  throughput 
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performance  of  the  ZFS  algorithm  via  simulations.  The  results 
show  that  for  a  realistic  number  of  transmit  antennas,  ZFS 
achieves  a  significant  fraction  of  the  throughput  of  gZF-DP 
and  sum  capacity  at  a  low  coding  and  online  computation  cost. 
The  simulation  results  also  indicate  that  at  high  SNR,  ZFS 
achieves  the  same  slope  of  throughput  per  decibel  of  SNR  as 
the  capacity-achieving  strategy  based  on  the  use  of  DP  coding 
for  known  interference  cancellation  and  convex  optimization. 

Due  to  its  simplicity,  low  complexity,  and  close  to  optimal 
performance,  the  proposed  ZFS  method  offers  an  attractive  al¬ 
ternative  to  earlier  DP-based  methods  when  M  N . 


Then,  the  pdf  of  the  parent  distribution  of  rn<u  of  the 
remaining  users  given  dn- 1  <  •  •  ■  <  di  is 

/*oo 

frn\dn-i,...,d1(xn\y7l—l-,---iyi)  /  /?’„  |r„_i  (xn  It1) 

Jo 

■fr,  I . •  •  •  ,Vl)dv  (51) 

where  xn  <  v  <  <  •  •  •  <  y\. 

3)  dn  conditioned  on  d„  j. . . . ,  d\  is  the  maximum  of  M  — 
n  +  1  r.v.s  with  pdf  given  in  (51).  Using  order  statistics 
[3],  we  obtain 


Appendix  A 

Derivation  of  the  PDF  of  dn 
Note  that  there  are  three  basic  steps  in  deriving  fdn(xn)'. 

1)  Truncation  of  the  parent  pdf  after  selecting  user  ,s'n_-| : 
Find  the  conditional  pdf  of  /•„  _  ]  u  of  the  remaining  users 
(ueu\  Sn- 1)  given  realizations  of  d„,_i  <  •  •  •  <  d\. 
From  order  statistics  [3],  we  obtain  (48),  shown  at  the 
bottom  of  the  page, 

2)  Mapping  of  . (:r,„  ,  |//„  into 

/•vK-i . di(^n\yn-u--^yi)-  Given  realizations  of 

d{  for  i  =  1, . . . ,  n  —  1,  where  n  <  N,  there  are  n  —  1 
quadratic-form  equations 

di  =  hs.  P(  h*  . 

Let  the  eigenvalue  decomposition  of  P,  be 

P^  =  U,;0,;U|. 

From  (13),  it  follows  that  there  are  N  —  i  +  1  eigenvalues 
equal  to  1  and  i—  1  eigenvalues  equal  to  zero.  Then,  we 
can  write 


JV-i+l 

<u=  E  kku4i2- 

3= 1 

As  per  Assumption  1,  we  neglect  the  (mild)  depen¬ 
dence  of  the  projector  matrices  P(  on  the  df  s  for 
%  =  1 ,n  —  1.  This  yields 


—  frn\rn-i  (a'**l'u)* 

(49) 

Since  the  projection  h„P^  is  a  vector  in  an  N  —  n  + 
1-dimensional  subspace,  it  follows  from  Claim  1  that 


/'V!>W;  (*«!»•’) 


N-n+1  (Xn\N-n 

V  \  V  ) 

o, 


fora;n  G  [0,  v\ 
otherwise. 

(50) 


f dn  |d„_!,...,di  (xn \%n—li  •  •  •  )  Xl  ) 

=  (M-n+  1) 

(Xn\xn-li  •  •  •  i  ^l)] 

'  ipOn |*Tn—;l ,  .  . .  jX%).  (52) 


Appendix  B 
Proofs 

Proof  of  Proposition  1:  Let  us  first  prove  the  following: 


/r„|d„_i,...,di  (Xn\xri—  lj  •  •  •  i  xl) 

_  Xn—  1 ,  .  .  .  ,  ail) 

n— 1 

n  P'r-j  \dj-i,...,di  i.xj\xj—l  i  ■  ■  ■  )  xl) 

3= 1 


(53) 


where  fn(xn,  spn-t, . . . ,  Xi)  is  given  in  (45). 

This  is  proven  by  induction.  For  n  =  2,  we  have 

j-x  1 

fr2\d1(x2\xi)  =  /  frz\rx(x2\y)fri:\dx{y\xl)dy. 
Jy=x2 


From  (33),  (36),  and  (40),  we  obtain 


/?’2|di  (*^2  l^l) 


„N- 2 


/'  r  O'-'l)  I  (  V  —  1) 


exp  (— v{)  dv i. 


From  (45),  it  follows  that 


fr  % \di  (x2\xl) 


h(x2,xl) 

Fri  (X1 ) 


Induction  hypothesis:  (53). 
Induction  Step: 


fr.,.  .  (xn+l\xni  •  •  •  j  Xl ) 

rxn 

=  /  +  (xn+l  \'On)frn\dn,...,d1  (Vn \xn  11  •  •  ■  xl )dvn. 


'  Vn  — *Cn+l 


frn-1\dn-1_,...,di  (xn— 1  \Un— 15  -  •  -  ?  2/1 ) 

f  rn_i\dn_2,...,di  {xn  —  \  \yn—  2,-..,?/l) 
l<jn_2 . tii  ( y-n-i\y-n-2,—m )  ’ 


Xn-1  <  Vn-l  <■■■  <Vl 

otherwise. 


(48) 
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From  (48)  and  (50),  we  obtain 


...,aii) 

—  ^§y|tf„_i,...,di  (^nl 

^n— 1 )  •  •  ■ 

■  i  X]f  1 

p"  N-n 

Z"  ^n+1 

.  N—n—1 
) 

J  Vn—Xn-\- 1  ^Ti 

V  y 

*f p n  | dn — i (^n 

|^n— 1 )  •  • 

.  ,a.-i  )dv. 

By  the  induction  hypothesis,  we  have 


rf, . <1  {xn.i  1  *  •  •  • )  Xi) 


-l 


=  \Frn\dn-1,...,di(xn\xn-l,  ■  ■  ■  ;3h)] 
n— 1 

H  |dj_i,...,di  (.xj \xj—  1)  ■ ■ ■ ixl) 


-|  -1 


3= 1 


'  Vn  — *^n+l 


N-n  f  xn+1  \ 


JV-n-l 


Vn  V 


*  finifniXn—li  •  •  ‘X\)dvn. 
From  (45),  it  follows  that 

fr,  . <ii  (*«+l  •  •  •  •  J'l  ) 


n  -1 


H  Frj \dj-i,...,di  {Xj  \Xj— t *'»  •  •  )  ®l) 
i=l 

^  ~  n  N-n- 1 

r(JV-n+l)  n+1 


exp(-tn) 


^vn=ajn+i  Jvn-\=vn  Jvi=v 2 

■  din . . .  dvn-idvn. 


Applying  (45)  again,  we  have 

fr,.  .  I  ;</>.  (^n+ll^nj  •  •  •  >  ®l) 

_  ^n+lC^n+l?  xni  .  .  .  ,  an) 

n 

n  ^  v,  <•/ , — i '/■  i ' •  •  •  ixi) 

3= 1 


Now,  we  use  the  above  result  to  prove  Proposition  1 .  For  n  = 
1,  from  (44),  we  obtain 


/diOi)  =  M 


<t>i  (y)dy 


M-l 


For  1  <  n  <  N  and  substituting  (52)  into  (43),  we  obtain  the 
equation  shown  at  the  bottom  of  the  page.  Applying  (53),  we 


obtain  the  equation  at  the  top  of  the  next  page.  Dividing  the  left 
fraction  and  rearranging  the  right  one,  we  obtain 

Gn— l(xn—l*  •  •  •  7  3^l) 

n— 1  , 

n  [Frk \dk^..,dx  (xk \xk- 1 ,  •  •  •  ,  Xl )]  " 

_  k= 2 _ 

n—2  n_X_ • 

n  \Frj  \dj-i,...,di  (.Xj\xj—l,  .  .  .  ,  ail)] 

3=1 

1 

'  [Fri(Xl)]M~n+1 

Gn—\(xn—\ 7  •  •  •  >  aii) 

1 

“  [FrM*1-1' 

Therefore 


fdn(x) 


Ml 


( M  -  n)! 

poo  l‘X\  pxn- 2 


/ xi=x  J x 2=x  J xn— i=x  \_Jy=l 


pxn 

/  <l>n(y,Xn-U  ■  ■  ■  y£i)dy 
Jy= o 

[J  <t>k(xk,  ■  ■  ■  ,x1)dxn-1  ...dx  i. 


M- 


k=  1 

Proof  of  Theorem: 

d 


jr RgZF-DP  =  V  £-Ri,  where  ^ 

op  op  op  rtn  On 


dp,  8 
dp  dp' 


p  =  lOlogig  (P)  so  that  from  (32),  we  have 


dp 

dp 


dp  dP 
dP~d~p 


lnlO 
10  ‘ 


Using  the  Leibnitz  rule,  from  (31),  we  have 

(£))• 

It  follows  that 

_  log2  10  P 

op  10  p, 

In  order  to  determine  lim^oo  (d / dp)RgzF-DP,  we  will  de¬ 
termine  linip^oo  (P/p).  Note  that  p  — >  oo  is  equivalent  to 
P  — >  oo.  In  addition,  ( dP/dp )  =  N  —  ZiiLi  Fd,  (1/m)  >  Ofor 


i  (xn)  — 


Ml 


(M  —  nV  /  /  \Fr„\dn-1,...41{xn\Xn-H---rX\)\ 

yivi  lb).  JXl=Xn  Jx 2=x„  J xrL—i=xn 

n—1 

'  (xn\xn—l,  .  .  .  ,  ail)  J]  f^'  i-  rf) . <l\  (xk\xk— 1;  •  •  •  )  ail)] 


M—n 


M-k 


k= 2 


•  /rfc|(Jjt_i,...,<ii (aifc|aifc_i, . . .  ,.a?i)  [FPl(ari)]M  1  /ri(aii)dain_i . .  .dan 
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where 


fdn(xn )  — 


Ml 


(M  —  n)\ 


/‘OO 

px1 

Jx l=Xn  0 

lx  2=Xn 

tftnipCri’) 

■■■,Xl) 

Co  <Pn(y,  xn-u  •  •  • ,  xi  )dy 

n— 1 

II  P'rj\dj-1,...,d1(xj\xj—lj  ■  ■  ■  txl) 

Li  i 


M—n 


n—1 


II  F'r:i\dj_1,...,d1(xj\xj—li  ■  •  ■  ixl) 

3= 1 
ii—l  / 

f[  1  [-^rfe[4fe-i,...,di  (a'fe|a'fe— i;»»  • ®l)] 


M—k 


fk{xki  •  •  •  i  x\  ) 


fc-1 


i(xn )  — 


k= 2  L 


[4r (a;i)]M  1  Mxi)dxn-i  ---dxi 
Ml 


n  4,  d,  . d\  .  .  .  .  ,x\)  ■ 

i=l 


(. M-n)\ 

/‘oo  /*a?i 


(?n(y-.xn-i,...,xi)dy 


M- 


Jx i=Xn  Jx 2=Xn  J  Xn-l=Xn  U y=0 

n 

JJ  <t>k{xk ,  •  •  .,*iX4  i  1.  ■■■■,xi)  [Fn  (ati)]M_1  dx„- 1  ...dx  i 


fc=i 


n—1 


II  (-^fe  l^fc— 1>  •  •  ■  j  ail)] 


M-fc 


fc=2 


^4— l(atrl— 1;  .  .  .  ,  2!l)  —  n—i 


II  [4,<,  ,<ii  (xil®i-ia  •  •  •  s^i)] 
i=i 


M— ?i.+l 


n—1 fc-l 


n  n  -^rjidj_i,...,(ii(a'iia'i— ii  •  •  •  ixi) 

k= 2  i=l 


/x  >  0  so  that  P  — >  00  is  equivalent  to  /a  — >  00.  We  will  prove 
that  lim^^oo  (P//x)  =  4.  From  (32),  we  have 


P 

M 


Then 


where 


N  poo  -1  A  -oo  ^ 

X]  /  fdAZ)dz - X  /  -fdi  (z)dz. 

i=  1  -A/k  d  i=  1  il/M  2 

p  1 

lim  — =N—  lim  — 

fl—>OC  fj,  fl  >00  /i 


-N  f°°  1 

Sjv(m)  =  X  /  -fd,(z)dz. 

i=1  Jl/k  z 


Note  that  if  we  demonstrate  that 

$ 

lim  a-Sw(m)  =  0 

/u— »oo  Qfi 

the  desired  result  will  follow  because 

d 

lim  —gN(p)  =  0  =>  lim  ry,v(/x)  =  0(1) 
fi—>oo  opi  fi—> 00 

=>■  lim  -c/aKm)  =  0 

H^OO  H 

V  ^  AT 

=>  lim  —  =  N. 

fl^OO  fj, 

It  is  easy  to  check  that  (, d/dfI)gN(fi )  =  CLii1/ P)fdl(1/ d) 
so  that  it  suffices  to  prove  that  lim/t^oo  fd,(  1/m)  =  0  °F  equiv¬ 
alently,  limx_>o  fdi  ( x )  =  0,  for  i  =  1,2, _ N.  where  N  >2, 

and  N  <  M. 


From  (33)  and  (44),  it  follows  that  </>i(0)  =  0.  Then,  from 
(35),  it  follows  that  / ^  (0)  =  0. 

In  order  to  prove  that  /d„(0)  =  0  for  1  <  n  <  N,  we  will 
prove  that  <j>n( 0,  ar„_i, . . . ,  x\)  is  bounded  for  any  0  <  xn-i  < 
■  ■  ■  <  at i-  In  order  to  prove  that  <f>n( 0,xn-ii  ■  ■  ■  ixi)  is  bounded, 
consider  the  multiple  integral  [cf.  (45 )] 

pXn- 1  pXn- 2  fXl 

4=/  /  /  exp(-ui)dui...dun_i. 

•4„_ i=0  ■!  1:,.  .2  '7.  J  v1=v2 

Integrating  over  dui ,  we  obtain 

/•Xn-i  j'Xn-2 

In  —  j  /  •  •  •  /  exp(— u2)du2  . . .  rfun_i 

Jl)n-1=0  An-Filn-l  Jv 2=V3 

pXn  —  1  pXnr-2  /*^2 

-exp(-ati)  /  /  •••  /  dv2...dvn-i 

J Vn-1=0  J Vn--2=Vn-l  J 

4  =  4,1  -  exp(-ati)P„,2(atn-i,  •  •  • , ^2)- 

Observe  that  the  first  multiple  integral  on  the  right-hand  side 
(RHS),  which  is  denoted  4,l»  has  the  same  form  as  4,  Due  to 
a:  1  >  a  2  >  •  •  •  >  at„-i  >  0,  we  have 

Pn,l(atn— 17  ■  •  ■  ■  at2) 

/*aii  /*a?i  /*aii 

<  /  /  •••  /  dv2...dvn-1. 

i«„-l=0i»„-2=0  Jv2=0 

Therefore 


Pn,l(xn—1)  •  •  •  5  at 2)  ^  atx 
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Note  that  exp(— x^x™*2  is  bounded  for  all  x\  >  0  so  that 
exp(— Xi)Bnii(xn-i, . . .  ,x2)  is  also  bounded.  Integrating  over 
all  dummy  variables,  we  obtain 


In 


where 

^n,  7  •  •  •  i  &j-\- 1) 


exp(-ic„_i)  -  1 

n— 2 

-  J^exp  (-Xj)Bn>j(xn-fy  ...,x2) 

3= 1 


rxj+ 1 

/  dvj+ 1 

lvj+1=Vj+2 


■  ■dvn- 1. 


It  can  be  shown  that  exp(—  Xj)  Bnj(xn-i, . . .  ,x2)  is  bounded 

by  the  same  argument  as  for  exp(— x\)  _ ,x2)- 

Therefore,  (p„  (().;/:„  _| . . . .  ,x\)  is  bounded  for  all  x\  >  x2  > 

•  •  •  >  xn_i  >  0. 

Then,  from  (45),  it  follows  that 


(J)n  (0,  Xn—\ ,  X\  ) 

f  P"-1  —  r  exp(-v1)dv1...dvrl-1 

I  Jvn-1=°  Jv-L=v2 


.0, 


r(iv-?i+i) 


,  n  =  N 
n  <  N. 


If  n  <  N,  then  from  (43),  it  follows  that  ffjr  (0)  =  0.  If  n  =  N, 
then  applying  the  mean-value  theorem,  we  obtain 


lim  /  (j)n(y,xr^i,:i..,,xi)dy 


Xn  ~->0  Jy  () 


=  lim  </>n(0,  xn-i, . . xi)xn. 

Xn^O 


Since  <f>n( 0,  xn-\H  . . . ,  x{)  is  bounded 


lim  /  <f>n(y,xn- 1,.. 

*<>./,;  .() 


.  ,aii)dr/  =  0. 


Finally,  from  (43)  and  n  =  N  <  M,  it  follows  that  /,/  v  (0)  = 
0.  ■ 
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ABSTRACT 

We  consider  the  problem  of  tracking  the  frequency  and  com¬ 
plex  amplitude  of  a  time-varying  (TV)  harmonic  signal  using  par¬ 
ticle  filtering  (PF)  tools.  Similar  to  previous  PF  approaches  to  TV 
spectral  analysis,  we  assume  that  the  frequency  and  complex  am¬ 
plitude  evolve  according  to  a  Gaussian  AR(1)  model;  but  we  con¬ 
centrate  on  the  important  special  case  of  a  single  TV  harmonic. 
For  this  case,  we  show  that  the  optimal  importance  function  (that 
minimizes  the  variance  of  the  particle  weights)  can  be  computed 
in  closed  form.  We  also  develop  a  suitable  procedure  to  sample 
from  the  optimal  importance  function.  The  end  result  is  a  cus¬ 
tom  PF  solution  that  is  more  efficient  than  generic  ones,  and  can 
be  used  in  a  broad  range  of  important  applications  that  postulate 
a  single  TV  harmonic  component,  e.g.,  TV  Doppler  estimation  in 
communications  and  radar. 

1.  INTRODUCTION 

Spectral  analysis  and  time-frequency  analysis  are  core  tools  in  sig¬ 
nal  processing  research  (e.g.,  [10,  3]).  Time-varying  (TV)  spectra 
arise  in  a  broad  range  of  important  applications:  from  speech,  to 
radar,  to  wireless  channel  modeling  and  estimation. 

TV  spectral  analysis  tools  range  from  basic  non-parametric 
approaches  such  as  the  spectrogram,  to  the  Wigner-Ville  and  other 
time-frequency  distributions,  and  on  to  parametric  ones  such  as 
polynomial  basis  expansion  models,  and  TV  line  spectra  mixture 
models. 

Line  spectra  mixtures  (whether  stationary  or  TV)  entail  a  non¬ 
linear  observation  equation,  which  complicates  parameter  estima¬ 
tion.  When  the  evolution  of  model  parameters  can  be  captured  in 
state-space  form,  particle  filtering  (PF)  tools  become  particularly 
appealing  for  tracking  the  model  parameters.  For  a  multicompo¬ 
nent  TV  harmonic  mixture  model,  PF  approaches  have  been  pur¬ 
sued  in  [1,  7],  In  [1],  the  evolution  of  harmonic  parameters  (fre¬ 
quencies,  complex  amplitudes,  possibly  also  decay  rates)  is  mod¬ 
eled  using  a  Gaussian  auto-regressive  (AR)  process,  and  an  im¬ 
proved  auxiliary  particle  filtering  algorithm  is  applied  to  track  the 
parameters.  In  [7],  a  similar  Gaussian  random  walk  model  is  used 
for  the  evolution  of  the  parameters.  Unlike  [1],  temporal  slices 
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of  the  spectrogram  are  used  in  the  measurement  equation  of  [7] 
(which  apparently  limits  the  attainable  time-frequency  resolution), 
and  an  unscented  PF  algorithm  is  adapted  to  track  the  model  pa¬ 
rameters. 

Gaussian  AR  models  of  the  evolution  of  harmonic  mixture  pa¬ 
rameters  are  plausible  and  convenient  in  many  situations  -  e.g., 
they  can  capture  smoothness  due  to  inertia  or  other  physical  con¬ 
straints.  Following  [1,  7],  we  also  assume  that  the  frequency  and 
complex  amplitude  evolve  according  to  a  Gaussian  AR(1)  model; 
but  we  concentrate  on  the  important  special  case  of  a  single  TV 
harmonic.  For  this  case,  we  show  that  the  optimal  importance 
function  (that  minimizes  the  variance  of  the  particle  weights)  can 
be  computed  in  closed  form.  We  also  develop  a  suitable  procedure 
to  sample  from  the  optimal  importance  function.  The  end  result  is 
a  custom  PF  solution  that  is  more  efficient  than  generic  ones,  and 
can  be  used  in  a  broad  range  of  important  applications  that  postu¬ 
late  a  single  TV  harmonic  component,  e.g.,  TV  Doppler  estimation 
in  communications  and  radar. 

2.  DATA  MODEL 

Let  Xfc  :=  [u>k,Ak]T  denote  the  state  at  time  k,  where  uik  £  5R  and 
Ak  £  C  denote  instantaneous  frequency  and  complex  amplitude. 
The  state  evolves  according  to  the  following  AR(1)  model: 

X-k  -  Hxft-i  A  [kfc  — 1  Wk  —  l] 

where  H  is  2  x  2  diagonal,  H  =  diag  ^[6i,  &2]Tj ,  with  be  equal 
to  1  —  te  (e.g.,  0.999).  The  process  noise  sequence  is  i.i.d.  The 
process  noise  vector  at  time  k  consists  of  two  independent  random 
variables  with  the  following  marginal  statistics: 

[ufc-i  wk-i]T  ~  [  M  (0,o£)  ,  CM{0,  2a\)]T  , 

where  A/”,  CM  stand  for  the  (real)  normal  and  circularly  symmetric 
complex  normal  distribution,  respectively.  The  measurements  are 
related  to  the  state  via  the  measurement  equation 

% Ik  =  xfc(  2)e’*k<'1)k  +  vk, 

where  Vk  denotes  i.i.d.  CM(0 ,  2a„)  measurement  noise  . 

Given  a  sequence  of  observations  {t/fc}^=1,  the  problem  of 
interest  is  to  estimate  the  sequence  of  posterior  densities,  that  is 

P  (xfc|  (yi}f=i),  k  £  {1,  -  -  -  ,  T}.  Given  p(xfe|{  t/*}f=1),  one 
can  estimate  x^  via  the  associated  (posterior)  mean,  or  mode. 


3.  PARTICLE  FILTERING 

Particle  filtering  has  emerged  as  an  important  sequential  state  esti¬ 
mation  method  for  stochastic  non-linear  and/or  non-Gaussian  state- 
space  models,  for  which  it  provides  a  powerful  alternative  to  the 
commonly  used  extended  Kalman  filter.  See  [2,  5,  6]  for  recent 
tutorial  overviews. 

In  particle  filtering,  continuous  distributions  are  approximated 
by  discrete  random  measures,  comprising  “particles”  and  associ¬ 
ated  weights.  That  is,  a  certain  continuous  distribution  of  interest, 
say  p(x),  is  approximated  as 


W„S(x  -  Xn), 


where  5(-)  denotes  the  Dirac  delta  functional.  A  useful  simplifi¬ 
cation  stemming  from  this  approximation  is  that  the  computation 
of  pertinent  expectations  and  conditional  probabilities  reduces  to 
summation,  as  opposed  to  integration.  While  this  can  also  be  ac¬ 
complished  via  direct  discretization  over  a  fixed  grid,  the  use  of 
a  random  measure  affords  flexibility  in  adapting  the  particle  loca¬ 
tions  to  better  fit  the  distribution  of  interest. 

Different  types  of  particle  filters  may  be  applied  to  a  given 
state-space  model.  The  various  particle  filters  primarily  differ  in 
the  choice  of  so-called  importance  (or,  proposal )  function.  Differ¬ 
ent  importance  functions  yield  different  estimation  performance  - 
complexity  trade-offs.  From  the  viewpoint  of  minimizing  the  vari¬ 
ance  of  the  weights,  the  optimal  importance  function  is  given  by 
[2,  5 1 


p(xfc|xn,fc_i,  yk)  = 


p(yk\xk)p(xk\xn:k-i) 

/Xp(l/fc|x)p(x|xnifc_i)dx’ 


where  x„tk  '■=  [un,k,  A„:k]T  denotes  the  rr-th  particle  at  time 
k.  The  optimal  importance  function  usually  strikes  a  better  per¬ 
formance  -  complexity  trade-off  than  other  alternatives.  There  are, 
however,  two  difficulties  associated  with  the  use  of  the  optimal  im¬ 
portance  function.  First  and  foremost,  it  requires  multidimensional 
integration  to  compute  the  normalization  factor,  which  is  usually 
intractable.  Second,  sampling  from  the  optimal  importance  func¬ 
tion  is  a  rather  complicated  process.  Thankfully,  for  our  particular 
model,  it  turns  out  that  it  is  possible  to  carry  out  the  integration 
analytically.  This  is  explained  next. 

Define  a  dummy  variable  x  :=  [cu,  A]T,  and  let  D(yk,  xn^_i) 
/Xp(yfc|x)p(x|xn,fc-i)<ix.  Then 


D(yk  i  Xn,fe- i )  — 


/  / 

Jujevi  J  a 


€5?  JAeC  27rcrn 


(o;-b1a;n  fc-1)2 


I  A~b2  An,k-1  I 


Letting  tua  ■=  b2An^-i  ,  mw  :=  biu)n,k-i,  v  ~  4>a  -  <t>vh ,  it 

can  be  shown  that 


D(lJki'X-n,k  —  l')  — 


2n(a2A  +  a-2) 


_ \vkr+\ 

e  2<-i+-n)  x  B, 


with  the  multiplicative  factor  B  given  by 

°A  + 


1TL —  +  00  |  III  /  u  \2  2 

+2  (— l)mIm(—  7“'4  yk2  )e  2  cos (mkrtioi—mv) 

m=l 

where  Im(-)  denotes  the  modified  Bessel  function  of  the  first  kind. 
The  sum  term  above  is  quite  interesting.  Due  to  the  negative  ex¬ 
ponential  dependence  on  the  time  index  k  and  the  properties  of 
Bessel  functions,  it  vanishes  quickly  with  k  -  only  the  zero-order 
Bessel  term  remains. 

We  use  rejection  [4,  pp.  40-42]  to  generate  samples  from  the 
optimal  importance  function  p(xk |xn,fc_i,  yk)  = 


2vt(ctJ+ctJ) 


\vk  r  +  l™  .4  r 
2^A+’Tl)  B 


Let  : 


2  :=  and  „  ,=  Using  the  triangle 


inequality,  it  can  be  shown  that  a  suitable  dominating  density  is 


f?(xfc  |xn)fc_i ,  yk)  — 


(|Afc|-fc)2 

e  2<,w  e  zv2 
(27t)2Q0  o^o 


\/2nQo 
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Qo  :=  dr  =  J  erfc(  ^ 

Jr= o  sphta  2  OA<Jn\j2(<j2A  +  a2) 

Fig.  1  shows  a  typical  plot  of  the  dominated  and  dominating  den¬ 
sities,  illustrating  the  tightness  of  the  bounding  step.  The  overall 
algorithm  is  summarized  in  Table  1 . 


4.  CRAMER-RAO  LOWER  BOUND 

The  Cramer-Rao  Lower  Bound  (CRLB)  for  our  model  can  be  com¬ 
puted  using  the  recursive  formula  of  Tichavsky  et  al  [11]  for  the 
calculation  of  the  Fisher  information  matrix,  Jk-  The  state  equa¬ 
tion  in  our  particular  model  is  linear,  Gaussian;  this  allows  consid¬ 
erable  simplification  of  the  general  result  in  [11],  thus  yielding 

Jk  =  Dl2_!  -  Dj^Jfc-i  +  D^-rr'Djt2.!,  k>  0 


Dfc-r  :=  — E{VXfe_1  [VXi_j  log p(xk|xk_i)] T}, 
Dili  :=  [Dl-if  =  — E {Vxt  [Vx,.,  logp(xk|xk_1)]T}, 


Dfc-i  :=  — E{VXfc  [VXfc  logp(xfc|xfc_i)]T}- 
E{VXfc  [VXfc  log p(yfc|xfc)]T}. 

At  this  point,  it  is  convenient  to  rewrite  our  model  in  real-valued 
form.  Upon  defining  x'fc  :=  [u>k,  Sft(Afc),  Q!(Afc)]T,  where  5j(-) 
extract  the  real,  resp.  imaginary  part,  we  have 


xl  =  H'xU  +  Ufc_i 


y fc  =  SR{AfceJ"fcfe}  Q{Ake3U,lck}  J  +  vfc 


where  H'  =  diag  (j&i, 62, &3]T^ ,  with  be  being  1  —  te,  ua,_i  ~ 
A/"(0,  Q)  with  Q  =  diag  ^  [<r2 ,  <j\,  a  a] ,  and  vfc  ~  A/"(0,  R) 
with  R  =  diag  ^  [cr2 ,  a2] T ^ .  Then 

Vl-i  = 

=  [D21_!]T  :=-H'TQ-\ 

Df_!  =  Q”1  +  E{FkTR-1Fk}, 
with  Fk  being  the  2x3  matrix 

Fk  =  Vx,  [  K{ Ak e>“kk}  ^{Akejulkk}  ]T. 

For  Dj.Li  and  D\2-i  ,  note  that  the  expectation  operator  was  dropped 
because  the  respective  Jacobians  are  independent  of  the  target  state. 
The  expectation  operator  in  the  expression  for  can  be  easily 

estimated  using  MC  integration;  it  can  also  be  calculated  analyt¬ 
ically,  albeit  the  resulting  formula  appears  cumbersome.  Putting 
terms  together  yields 

Jfc  =  Q  1  +  E{FkTR_1Fk}  -  Q^H'x 

(Jfc-i  +  H,TQ-1H,)-1H,TQ~1,  k  >  0 
The  initial  density  p(xo)  is  taken  to  be  AT  (xo,  Qo).  in  which  case 
Jo  =  Qo  ■ 

5.  SIMULATIONS 

In  our  simulations,  we  benchmark  the  performance  of  our  optimal 
particle  filter  against  the  CRLB  and  two  additional  particle  filters: 
an  Auxiliary  PF,  and  a  regularized  PF.  The  three  alternative  particle 
filters  are  briefly  discussed  next. 

5.1.  Regularized  PF  (RPF) 

This  algorithm  is  identical  to  the  Sampling  Importance  Resam¬ 
pling  (SIR)  algorithm,  which  uses  the  prior  importance  function, 
except  for  a  “jittering”  of  the  resampled  particles  (using  a  normal 
distribution  kernel)  in  order  to  protect  the  filter  from  sample  im¬ 
poverishment;  see,  e.g.,  [2],  Since  the  process  noise  involved  in 
our  model  is  relatively  small,  this  modification  is  expected  to  im¬ 
prove  the  performance  over  the  standard  SIR.  However,  this  filter 
also  has  well  known  disadvantages  -  the  samples  are  no  longer 
guaranteed  to  approximate  the  posterior  density  asymptotically  in 
the  number  of  particles. 

5.2.  Auxiliary  SIR  (ASIR)  Filter 

The  particular  algorithm  used  is  the  Auxiliary  SIR  filter  introduced 
by  Pitt  and  Shephard  (see  [9]).  This  filter  tries  to  explore  the  state- 
space  in  a  more  sophisticated  way  than  the  SIR  filter.  This  is  done 
by  resampling  at  the  “previous”  time  step  based  on  certain  point 
estimates  that  capture  the  essential  features  of  the  posterior  density. 
This  approximation  can  be  inefficient  when  the  process  noise  is 
large,  or  when  the  auxiliary  index  varies  a  lot  for  a  fixed  prior. 
When  process  noise  is  small  enough,  though,  the  ASIR  filter  is 
reported  to  improve  the  performance  over  the  standard  SIR. 


5.3.  PF  Using  Optimal  Importance  Function  (PF-OIF) 

For  our  particular  model  and  choice  of  sampling  procedure,  an  im¬ 
plementation  is  given  in  Table  1 .  Note  that  this  algorithm  allows 
both  the  weight  update  and  the  resampling  step  to  be  performed 
prior  to  sampling  from  the  optimal  importance  function.  An  ad¬ 
ditional  regularization  step  can  be  incorporated,  if  necessary,  to 
improve  the  filter’s  diversity  after  resampling. 


5.4.  Estimation  performance  results 

In  the  following,  we  focus  on  the  frequency  estimation  perfor¬ 
mance  of  the  three  aforementioned  filters  in  a  tracking  mode,  wherein 
the  initial  state  is  assumed  to  be  known  exactly  -  corresponding  to  a 
Dirac  delta  initial  distribution.  The  associated  CRLB,  however,  as¬ 
sumes  that  the  initial  density  is  a  Gaussian.  This  mismatch  is  dealt 
with  by  using  a  very  tight  density  (very  small  initial  variance)  to 
approximate  a  delta  distribution.  The  expectation  appearing  in  the 
CRLB  was  approximated  using  100  realizations  of  the  state  vector. 
The  error  curves  corresponding  to  the  three  filters  were  produced 
by  averaging  over  200  independent  runs,  each  comprising  80  tem¬ 
poral  samples.  The  conditional  mean  was  used  to  generate  point 
state  estimates.  System  parameters  were  set  to  be  =  0.999,  W, 
(Jui  =  0.01  ,  (ja  =  0.01  ,  <t2  =  0.2,  and  multinomial  resampling 
was  employed.  The  number  of  particles,  N,  was  1000  for  RPF, 
800  for  ASIR,  and  30  for  PF-OIF.  The  results  are  summarized  in 
Fig.  2.  It  is  satisfying  to  see  that  all  three  biters  operate  close  to 
the  CRLB,  and  PF-OIF  in  particular  performs  that  well  with  order- 
of-magnitude  less  particles.  This  being  a  three-dimensional  state- 
space,  such  good  performance  with  only  30  particles  is  not  at  all 
obvious.  RPF  and  ASIR  biters  perform  very  poorly  with  less  than 
a  few  hundred  particles  in  this  context.  A  small  number  of  parti¬ 
cles  implies  small  memory  requirements,  but  on  the  other  hand  the 
use  of  rejection  in  our  present  implementation  of  PF-OIF  entails  a 
random  delay,  which  can  be  signibcant,  depending  on  system  pa¬ 
rameters.  We  are  presently  looking  at  possible  ways  of  speeding 
up  the  sampling  step. 


6.  CONCLUSIONS 

We  revisited  the  important  problem  of  tracking  a  single  time-varying 
harmonic,  whose  frequency  and  complex  amplitude  evolve  accord¬ 
ing  to  a  linear  Gaussian  separable  AR(  1 )  model.  A  key  difficulty  in 
treating  this  model  comes  from  the  nonlinear  measurement  equa¬ 
tion.  For  this  model,  we  derived  the  optimal  importance  function 
in  closed  form.  This  yields  interesting  insights  and  opens  up  the 
possibility  of  designing  particle  biters  that  are  more  efficient  than 
generic  ones.  We  also  derived  a  procedure  to  sample  from  this 
optimal  importance  function,  using  rejection  and  the  concept  of  a 
dominating  density.  Our  preliminary  numerical  experiments  com¬ 
paring  the  resulting  biter  to  standard  particle  biters  and  the  CRLB 
show  that  the  proposed  PF-OIF  algorithm  has  merits,  particularly 
in  terms  of  reducing  the  number  of  particles,  and  therefore  mem¬ 
ory  requirements  as  well.  Our  present  implementation  of  PF-OIF 
can  be  slow,  due  to  the  use  of  rejection.  We  are  currently  looking 
at  other  alternatives  as  well  as  extensions  to  more  general  signal 
models. 
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Fig.  1.  Illustration  of  dominated  (optimal  importance  function) 
and  dominating  densities  as  a  function  of  frequency  for  fixed  com¬ 
plex  amplitude. 


Fig.  2.  Comparison  of  the  three  particle  filters  and  the  CRLB 


Table  1.  PF  using  OIF  for  Tracking  A  Single  Time- Varying  Har¬ 
monic  (see  text  for  definition  of  constants) 


=  PF  —  OIF  [{xl i}fet, y*] 


1 .  Compute  normalized  importance  weights 

•  FOR  i=l:N, 

i«fc  i2+i&2^1_i  F 
wl  =  ■>*  2  e  2(0 

2n((r^-t-a2  ) 

•  END  FOR 


[) 


x  B 


•  FOR  i=l:N, 

-  Normalize  :  wj  =  w l /sum  [{wj.}^-!] 

•  END  FOR 

2.  Resample  — >  equally  weighted  particles 

[{xU}^]  =  RESAMPLE  [{x^,  wl}^] 


3.  Sample  from  the  optimal  importance  density  : 


•  FOR  i=l:N, 


_  \b2Ak-l^yk\ 

-  Calculate  C  :=  s/2 /Qa/e  <ia+-2  Bo 

-  Set  U  :=  1/ eps  and  t  :=  1/ eps 
WHILE  {Ut>  1) 

-  Draw  candidate  sample  ~  dominating  density: 

x*  „  si  ^ 

*-k 


(\Ak\-^ 

2<t2 


(2n)2Q0 

-Set  r  •—  cDominating(xfc> 

Optimal  (xj^) 

-Draw  a  sample  U  ~  Uniform[0, 1] 


•  END  WHILE 

•  END  FOR 
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ABSTRACT 

The  problem  of  tracking  the  frequency  and  complex  amplitude  of 
a  frequency-hopped  complex  sinusoid  is  considered,  using  a  novel 
stochastic  state-space  formulation  and  particle  filtering  tools.  The 
problem  is  of  considerable  interest  for  interference  mitigation  in 
frequency-hopped  wireless  networks,  and  in  military  communica¬ 
tions.  The  proposed  particle  filtering  approach  has  a  number  of 
desirable  features.  It  affords  high-resolution  estimates  of  carrier 
frequency  and  hop  timing,  manageable  complexity  (linear  in  the 
number  of  processed  samples),  and  flexibility  in  tracking  signals 
with  irregular  hopping  patterns  due  to  intentional  timing  jitter.  The 
proposed  state-space  model  is  not  only  parsimonious,  but  fortu¬ 
itous  as  well:  it  turns  out  that  the  associated  optimal  importance 
function  can  be  computed  in  closed  form,  and  thus  samples  from 
it  can  be  drawn  using  rejection  techniques.  Both  prior  and  opti¬ 
mal  importance  sampling  versions  are  developed  and  illustrated  in 
pertinent  simulations. 

Keywords:  Frequency  hopping,  spectral  analysis,  estimation 
of  time-varying  line  spectra,  sequential  importance  sampling,  par¬ 
ticle  filtering 


1.  INTRODUCTION 

Tracking  the  frequency  of  a  time-varying  complex  sinusoid  is  an 
important  problem  which  arises  in  numerous  applications.  In  speech 
processing,  for  example,  one  is  often  interested  in  tracking  formant 
frequencies.  In  wireless  communications,  it  arises  in  the  context  of 
frequency  hopping,  when  the  receiver  has  no  prior  knowledge  of 
the  hopping  pattern,  or  is  simply  out  of  sync  with  the  transmitter’s 
hopping  pattern  generator  [2,  8,  6,  7]. 

Both  non-parametric  time-frequency  analysis,  and  paramet¬ 
ric  techniques  have  been  developed  for  the  more  general  prob¬ 
lem  of  tracking  a  time-varying  sinusoid,  and  can  be  applied  to  the 
problem  of  tracking  a  frequency-hopped  sinusoid  as  well.  How¬ 
ever,  existing  methods  have  limitations,  especially  when  used  to 
track  a  frequency-hopped  signal.  Non-parametric  methods,  like 
the  spectrogram,  or  coarse  channelization  [2]  suffer  from  limited 
frequency-  and  temporal-resolution  due  to  leakage.  It  is  possible 
to  employ  time-frequency  distributions  that  are  better-adapted  to 
frequency  hopping  [3],  but  the  results  are  still  not  very  satisfac¬ 
tory.  Parametric  methods  for  frequency  hopping  explicitly  model 
the  frequency  as  piecewise-constant,  assume  a  “budget”  on  the 

*  Corresponding  author.  Supported  in  part  by  the  Army  Research  Lab¬ 
oratory  (ARL)  through  participation  in  the  ARL  Collaborative  Technology 
Alliance  (ARL-CTA)  for  Communications  and  Networks  under  Coopera¬ 
tive  Agreement  DADD19-01-2-001 1,  and  in  part  by  ARO  under  ERO  Con¬ 
tract  N62558-03-C-0012. 


number  of  hops  within  a  given  observation  interval,  and  employ 
dynamic  programming  to  track  the  sought  frequency  and  complex 
amplitude  parameters  [6,  7].  Other  than  an  upper  bound  on  the 
number  of  hops,  the  methods  in  [6,  7]  do  not  assume  anything  else 
about  the  frequencies  or  complex  amplitudes,  which  are  treated  as 
deterministic  unknowns. 

A  different  viewpoint  is  adopted  in  this  paper.  A  stochas¬ 
tic  non-linear,  non-Gaussian  state-space  formulation  is  proposed, 
which  captures  frequency  hopping  dynamics  in  a  probabilistic  sense. 
The  proposed  formulation  is  naturally  well-suited  for  the  applica¬ 
tion  of  particle  filtering  for  state  estimation.  Compared  to  the  prior 
state-of-art  in  [6,  7],  the  new  approach  has  a  number  of  desirable 
features: 

•  Computational  complexity:  The  complexity  of  particle  fil¬ 
tering  is  O(NT),  where  N  is  the  number  of  particles  and  T  is 
the  number  of  temporal  samples.  The  complexity  of  dynamic  pro¬ 
gramming,  on  the  other  hand,  is  roughly  0(T4).  This  means  that 
only  short  segments  can  be  processed  by  dynamic  programming, 
and  then  one  has  to  rely  on  hop  periodicity  to  segment  the  rest  of 
the  data.  This  has  two  disadvantages:  first,  the  more  samples  are 
processed  the  better  from  an  estimation  performance  perspective; 
second,  hop  timing  is  often  intentionally  randomized  as  a  counter¬ 
measure. 

•  Flexibility:  The  state-space  model  in  the  particle  filtering 
formulation  can  be  easily  tailored  to  match  a  given  scenario  (e.g., 
spread  bandwidth  and  modulation). 

The  proposed  state-space  model  is  simple  and  fortuitous:  the 
associated  optimal  importance  function  can  be  computed  in  closed 
form,  and  thus  samples  from  it  can  be  drawn  using  rejection  tech¬ 
niques.  Both  prior  and  optimal  importance  sampling  versions  are 
developed  and  compared  in  pertinent  simulations. 


2.  DATA  MODEL  AND  PROBLEM  STATEMENT 


We  propose  the  following  non-linear  non-Gaussian  stochastic  state- 
space  model  of  a  frequency-hopped  complex  sinusoid.  Let  Xfc  := 
[c Ok,Ak]T  denote  the  state  at  time  fc,  where  uik  £  [ — 7r,  7t)  and 
Ak  €  C  denote  instantaneous  frequency  and  complex  amplitude. 
Let  Ufc  :=  [bk,u>k,  Ak]T  denote  an  auxiliary  sequence  of  indepen¬ 
dent  and  identically  distributed  (i.i.d.)  vectors  with  independent 
components  and  the  following  marginal  statistics:  bk  is  a  binary 
random  variable  with  Pr(bk  =  1)  =  h;  u>k  is  uniformly  dis¬ 
tributed  over  [—7 r,  n),  denoted U ([ — 7r,  7r));  and  Ak  is  CA/”(0,  ajj), 
i.e.,  complex  circular  Gaussian  of  variance  o\.  Then 


Xfc  =  /(Xfc_  i,Ufc 


Xfc—l 

[ufc(2),  Ufc(3)]T 


,  Ufc(l)  =  0 

,Ufc(l)  =  1 
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Xfc-i  ,w.p.  1  —  h 

[W([-7T,7r)) ,  CAf{0,a2A)]T  ,w.p.h 

l Ik  =  Xfc(2)eJXfc(1)fc  +  vk, 

where  Vk  denotes  i.i.d.  CAf(0,  cff)  measurement  noise,  and  itfc(l) 
the  hop  variable. 

The  above  state-space  formulation  models  frequency  hopping 
in  a  probabilistic  fashion.  Hops  are  random,  i.i.d.,  with  hop  proba¬ 
bility  h  per  sample  interval.  This  is  different  from  traditional  mod¬ 
els  of  frequency  hopping,  which  assume  that  the  frequency  hops 
periodically,  and  is  motivated  by  the  following  considerations: 

•  In  military  communications,  intentional  jitter  is  often  intro¬ 
duced  in  the  hop  timing  in  order  to  reduce  the  probability 
of  detection  by  unintended  receivers  and  improve  immunity 
to  jamming.  Timing  jitter  yields  a  pseudo-random  quasi- 
periodic,  or  even  seemingly  aperiodic  hop  timing  sequence. 

•  The  above  probabilistic  model  captures  information  about 
the  average  hop  rate  in  a  “soft"  ensemble  sense:  the  ex¬ 
pected  number  of  hops  over  a  long  observation  interval  T  is 
hT.  While  less  accurate  if  the  exact  hop  period  is  known, 
probabilistic  modeling  is  more  robust  with  respect  to  hop 
period  inaccuracies.  Finally, 

•  The  proposed  probabilistic  model  is  ideally  suited  for  on¬ 
line  sequential  estimation  via  particle  filtering. 

It  is  worth  elaborating  on  some  of  the  implicit  assumptions  of 
the  proposed  state-space  model. 

1.  When  the  (discrete-time,  baseband-equivalent)  frequency 
hops,  it  hops  anywhere  within  [— n,  vr)  with  a  uniform  den¬ 
sity.  This  is  well-suited  for  carrier  hopping,  which  is  usu¬ 
ally  discontinuous.  Modulation-induced  variations  can  (and 
should)  be  neglected  when  the  objective  is  to  estimate  car¬ 
rier  frequency,  but  could  also  be  explicitly  modeled  using, 
e.g.,  a  smooth  auto-regressive  frequency  variation  model 
in-between  hops,  in  lieu  of  the  simplified  constant  model 
postulated  above.  This  extension  is  relatively  simple. 

2.  When  the  frequency  hops,  the  complex  amplitude  also  changes 
according  to  an  i.i.d.  complex  Gaussian  distribution.  This  is 
also  well-motivated  for  carrier  hopping,  for  every  time  the 
carrier  frequency  hops  beyond  the  coherence  bandwidth  of 
the  channel,  a  new  channel  realization  is  encountered. 

The  problem,  then,  can  be  stated  as  follows:  Given  a  sequence 
of  observations  {yk}f=1,  estimate  the  sequence  of  system  states 
{xfc}^=1  -  that  is,  the  unknown  carrier  frequencies  and  complex 
amplitudes. 

3.  PARTICLE  FILTERING  SOLUTIONS 

Particle  filtering  has  emerged  as  an  important  sequential  state  esti¬ 
mation  method  for  stochastic  non-linear  and/or  non-Gaussian  state- 
space  models,  for  which  it  provides  a  powerful  alternative  to  the 
commonly  used  extended  Kalman  filter.  See  [1,  5]  for  recent  tuto¬ 
rial  overviews.  In  particle  filtering,  continuous  distributions  are  ap¬ 
proximated  by  discrete  random  measures,  comprising  “particles” 
and  associated  weights.  That  is,  a  certain  continuous  distribution 
of  interest,  say  p(x),  is  approximated  as 

N 

p(x)  «  ^2  -  X„), 

n=  1 


where  S(-)  denotes  the  Dirac  delta  functional.  A  useful  simplifi¬ 
cation  stemming  from  this  approximation  is  that  the  computation 
of  pertinent  expectations  and  conditional  probabilities  reduces  to 
summation,  as  opposed  to  integration.  While  this  can  also  be  ac¬ 
complished  via  direct  discretization  over  a  fixed  grid,  the  use  of 
a  random  measure  affords  flexibility  in  adapting  the  particle  loca¬ 
tions  to  better  fit  the  distribution  of  interest. 

3.1.  Basics  of  particle  filtering 

If  we  aim  for  an  on-line  filtering  algorithm,  in  which  the  state  at 
time  k  should  be  estimated  from  measurements  up  to  and  includ¬ 
ing  time  k,  the  key  distribution  of  interest  is  the  posterior  den¬ 
sity  p  ^xr-  |  •  Given  this  density,  one  can  estimate  the 

state  at  time  k ,  e.g.,  via  the  associated  (posterior)  mean,  or  mode. 
The  basic  idea  of  particle  filtering,  then,  is  to  begin  with  a  ran¬ 
dom  measure  approximation  of  the  initial  state  distribution,  and, 
as  measurements  become  available,  derive  updated  random  mea¬ 
sure  approximations  of  p  ^x^  |  k  £  {1,2,  -  -  -  } .  That 

is,  we  seek  random  measure  approximations 

N 

P  (x.k  |  {yi}2l)  =  Wn,kfi(*k  -  Xn,fc) 

71=1 

In  particle  filtering,  the  updates  -  the  derivation  of  p  |  {?/i}f=1^ 

from  p  ^Xfc_  i  |  -  are  based  on  the  Bayes  rule  [1,  5], 

A  random  measure  approximation  comprises  two  components: 
the  particles  (locations)  and  the  associated  weights.  If  we  could 
sample  from  the  sought  posterior  p  ^Xfc  |  {2/i}f=1  j ,  then  all  par¬ 
ticle  weights  would  have  been  equal.  Unfortunately,  such  direct 
sampling  is  not  possible  in  most  cases,  and  thus  we  resort  to  sam¬ 
pling  from  a  so-called  importance  function  that  “resembles”  the 
desired  posterior,  and  from  which  samples  can  be  drawn  with  rel¬ 
ative  ease.  The  mismatch  between  the  sought  density  and  the  im¬ 
portance  function  is  compensated  in  the  calculation  of  weights, 
chosen  proportional  to  their  ratio  evaluated  at  each  particle  [1,  5]. 
The  choice  of  importance  function  is  a  very  important  step  in  the 
design  of  a  particle  filtering  algorithm.  Two  common  choices  are 
discussed  next. 

3.2.  Prior  importance  function 

Perhaps  the  most  intuitive  choice  of  importance  function  is  the 
prior  importance  function  pfx.k  \  ~X-n,k- 1);  he.,  the  n-th  particle 
is  updated  by  propagating  it  through  the  state-evolution  part  of  the 
system:  x„,fc  =  f(’x.n,k- 1,  un).  This  is  an  often-made  choice,  for 
simplicity  considerations.  The  drawback  is  that  particles  evolve 
without  regard  to  the  latest  measurement,  which  only  comes  into 
play  in  the  ensuing  weight  update.  When  using  the  prior  impor¬ 
tance  function,  the  said  weight  update  at  time  instant  k  is  given 
by  wn,k  =  Wn,k-ip(yk  \  X-n,k),  followed  by  normalization  to  en¬ 
force  J2n= l  =  1- 

Regardless  of  the  particular  importance  function  employed,  a 
common  problem  in  particle  filtering  is  degeneracy,  the  weights 
of  all  but  a  few  particles  tend  to  become  negligible  after  a  few 
iterations  [1,  5].  Degeneracy  can  be  detected  via  degeneracy  mea¬ 
sures,  and  mitigated  via  resampling  techniques  [1,  5],  Resampling 
the  discrete  measure  replicates  particles  with  large  weights  and  re¬ 
moves  those  with  negligible  weights.  All  particle  weights  become 
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equal  after  resampling.  There  exist  several  computationally  effi¬ 
cient  ( 0(N ))  resampling  schemes  that  can  be  used  to  avoid  the 
quadratic  cost  of  brute-force  resampling  [1,5]. 

3.3.  Optimal  importance  function 

From  the  viewpoint  of  minimizing  the  variance  of  the  weights,  the 
optimal  importance  function  is  given  by  [1,  5] 


end,  note  that  p(xfc|xn,fc_i,  yk)  can  be  written  as  a  mixture  of  two 
pdfs 

p(xfc|x„,fc_i,2/fc)  =  (1— ft)po(xfe|xnifc_i,  2/fc)+ftpi(xfc|xn,fc_i,  yk), 
where 

Po(^Xfc|^Cn,fc  —  1,  Vk)  d{LOk  UJn,k  —  l)<5(^4fc  An^k  —  l), 


p(xfc|xn,fc_i,  yk) 


p(t/fc|xfc)p(xfc|xn,fc_i) 
fx  p(j/fc|x)p(x|xn,fc_i)dx 


Notice  that,  in  contrast  to  the  prior  importance  function,  the  above 
takes  into  account  the  newly  available  measurement  in  the  parti¬ 
cle  update  itself.  While  both  the  prior  importance  function  and  the 
optimal  one  yield  consistent  algorithms1,  the  optimal  one  usually 
works  well  with  much  smaller  N,  and  is  therefore  preferable  from 
a  performance  point  of  view.  There  are,  however,  two  difficulties 
associated  with  the  use  of  the  optimal  importance  function.  First, 
it  requires  multidimensional  integration  to  compute  the  normal¬ 
ization  factor,  which  is  often  intractable.  Second,  sampling  from 
the  optimal  importance  function  is  more  complicated  than  sam¬ 
pling  from  the  prior.  The  smaller  number  of  particles  needed  to 
attain  satisfactory  performance  with  the  optimal  importance  func¬ 
tion  usually  more  than  offsets  the  cost  of  drawing  samples  from  it; 
the  integration  problem  remains  the  bottleneck  in  most  cases  [1], 
Thankfully,  for  our  particular  model,  it  turns  out  that  it  is  possible 
to  carry  out  this  integration  analytically.  This  is  explained  next. 

Denote  x^  :=  [u>k,  Ak]T ,  where  u>k  G  [ — 7r,  7r),  and  Ak  £  C; 
likewise  x„  k-i  '■=  [wn,k-i,  An:k-i]T ,  and  a  dummy  variable 
x  :=  [w,  A]*1'.  Let  D(yk,nn,k-i)  ■=  /Xp(2/fc|x)p(x|xn,fc_i)dx. 
Then 


7%fc,xn>fc_ 1)  = 


Le[-K, tt)  I.AeC  27r(Jn' 


I  Vk-A’‘3 


h  1 


(1  -  h)5(u>  -  uin,k-i)5(A  -  An,k- i)  +  tt  x ‘“a 

Att  A'KG a 


dAdjjj 

This  integral  can  be  computed  by  completing  the  squares,  yield¬ 


ing 


D(yk,xntk~  i) 


1  h 


aA>  + 


1  1  -  h  - 
-e 


\yk~An,k-l 


27T  <t2 

For  the  above  optimal  choice  of  the  importance  function,  the 
weight  update  is  given  by 


Wn,k  oc  Wn,k-ip(yk\x-n,k-l)  =  Wn,k-lD(yk,X.n,k-l), 


p1(xfc|xn,fc_i,  Vk)  := 


ivk-Ake^kk{2  _\A]SY 
1  t  1  p  2ct2  p  2  cj  a 

27T  total  27 rcr2. 


27r  e 


2(<tS  +  <t4  > 


and 


\vk\z 


h  :=  h 


2 7T  CT2+(T2  e 


D(yk,xUtk- 1) 


It  follows  that  with  probability  1-hwe  simply  copy  the  previous 
particle,  else  we  draw  a  particle  frompi(xfc|xn,fc_i,  yk).  We  will 
use  rejection  sampling  techniques  for  this  latter  step,  as  explained 
next. 


3.4.  Sampling  from  the  optimal  importance  function:  Rejec¬ 
tion 

The  basic  idea  of  rejection-based  sampling  can  be  summarized  as 
follows  [4,  pp.  40-42].  Suppose  we  wish  to  draw  samples  from 
a  density  <£(x),  for  which  there  exists  a  dominating  density  <?(x) 
and  a  known  constant  c  such  that  0(x)  <  cp(x),  Vx.  In  practice, 
we  choose  p(x)  to  be  easy  to  sample  from,  and  such  that  c  is  as 
small  as  possible.  The  rejection  method  then  works  as  follows. 
We  i)  draw  a  sample  x  from  g(-)  and  an  independent  sample  U  ~ 
W([0, 1]);  ii)  set  r  :=  iii)  test  whether  Ut  <  1;  if  so,  we 

accept  the  sample  x;  else  we  reject  it  and  repeat  the  process. 

It  can  be  shown  that  the  above  rejection  method  generates  sam¬ 
ples  from  the  desired  density  </>(.),  and  the  mean  number  of  itera¬ 
tions  until  a  sample  is  accepted  is  c  (thus  the  desire  to  keep  c  >  1 
as  small  as  possible).  Furthermore,  the  distribution  of  the  number 
of  trials  is  geometic  with  parameter  1  —  |,  which  means  that  the 
probabilities  of  longer  trials  decay  exponentially  [4,  p.  42], 

In  our  present  context,  we  wish  to  sample  from  the  density 
pi(xfc|xn,fc_i,j/fc).  Define 

I  I  2  2  2 

\yk\V  A  2 GnGA 

y  ‘  2  i  2  5  ®  '  2  i  2  ’ 

G&  +gA  Gn  +  gA 

Using  the  triangle  inequality,  it  can  be  shown  that  the  following  is 
a  suitable  dominating  density: 


|^Cn,fc—  1 1  yk) 


(,\Ak\-y)2 

e  2^2 


(27t)5/2Qo<7  ’ 


followed  by  normalization  to  1.  What  is  missing  is  a  way  to  sample 
from  the  optimal  importance  function.  As  a  first  step  towards  this 

1  In  the  sense  that  the  pertinent  discrete  measure  approximations  con¬ 
verge  to  the  sought  continuous  distributions  as  N  — >  oo,  see  [1]  and  refer¬ 
ences  therein. 


for  which  it  holds  that  pi (xfc|xn,fc_i,t/fc)  <  cg(xk\xn,k-i,yk), 
with 


c  :=  \/2ttQo/g  >  1, 


Q  o  '■= 


°°  1  (r=>ir  1 

— —= e  2&2  dr  =  -erfc( - 

=o  <tv27t  2  rj , 


\Vk\GA 


+  ga) 
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Through  experimentation,  we  have  found  that  even  better  results 
can  be  attained  using  an  outer  rejection  loop,  which  declines  candi¬ 
dates  generated  through  rejection  when  the  following  metric 
exceeds  a  certain  small  value  (set  to  3  x  10-3  in  our  experiments): 


\yk\~ 


h(yk,X-n,k)  ■=  h 


27 r  (j2+ct 


D(yk,  x„,fc) 


where  D(-,  ■,  ■)  was  defined  in  Sec.  3.3.  This  outer  rejection  loop 
selects  particles  that  are  consistent  with  the  new  measurement  (cf. 
the  functional  form  of  the  denominator)  and,  at  the  same  time,  have 
large  weight  after  the  associated  update.  We  do  not  have  a  full 
explanation  at  this  point,  yet  this  version  of  the  algorithm  appears 
to  yield  the  best  results  -  in  particular,  better  than  the  one  based 
on  the  optimal  importance  function.  Note  that  the  latter  is  optimal 
with  respect  to  minimizing  the  variance  of  the  weights  after  the 
update  (and  typically  works  better  than  the  one  based  on  the  prior 
importance  function),  but  it  is  not  necessarily  optimal  in  terms  of 
the  performance  -  complexity  trade-off. 


4.  SIMULATIONS 

We  now  present  simulation  results  for  the  three  algorithms:  the 
basic  one  using  the  prior  importance  function  (denoted  P),  the  one 
using  the  optimal  importance  function  (O),  and  the  one  using  the 
outer  rejection  loop  as  above  (V).  Fig.  1  shows  a  plot  of  a  typi¬ 
cal  simulation  run,  using  the  posterior  mean  to  form  instantaneous 
frequency  estimates  and  multinomial  resampling  for  all  three  algo¬ 
rithms.  Monte-Carlo  (MC)  simulation  results  are  presented  in  Fig. 
2.  The  Root  Mean  Square  Error  (RMSE)  frequency  estimation 
performance  of  the  three  algorithms  is  assessed  using  the  follow¬ 
ing  parameters:  h  =  0.01,  T  =  100,  u\  =  1,  =  0.2,  and  the 

number  of  MC  trials  is  300.  The  execution  time  for  P  is  O(NT), 
whereas  for  O  and  V  the  execution  time  is  also  an  increasing  func¬ 
tion  of  h.  As  a  result,  O  and/or  V  can  be  faster  than  P,  even  for  the 
same  number  of  particles.  For  our  simulation  setup  above,  P,  O, 
and  V,  each  with  IK  particles,  have  about  the  same  average  execu¬ 
tion  time,  yet  V  does  much  better  in  terms  of  RMSE  as  shown  in 
Fig.  2.  It  takes  3K  particles  for  O  and  5K  particles  for  P  to  reach 
the  performance  of  V  with  IK  particles. 
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Particle  Filter  estimates  vs.  true  frequencies 


Fig.  1.  Typical  sample  run  of  the  three  algorithms  using  different 
number  of  particles  for  each. 


5.  CONCLUSIONS 

We  have  developed  three  new  particle  filtering  algorithms  for  track¬ 
ing  a  frequency-hopped  complex  sinusoid,  based  on  a  novel  stochas¬ 
tic  state-space  formulation.  The  algorithms  range  from  a  plain- 
vanilla  version  that  uses  the  prior  importance  function  (P),  to  a 
more  advanced  version  that  employs  the  optimal  importance  func¬ 
tion  (O),  and,  finally,  an  improvement  of  the  latter  using  a  problem- 
specific  outer  rejection  loop  (V).  The  two  latter  algorithms  afford 
considerably  better  performance  -  complexity  trade-offs. 
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ABSTRACT 

We  consider  the  problem  of  transmit  beamforming  to  multiple  co- 
channel  multicast  groups.  Since  the  direct  minimization  of  transmit 
power  while  guaranteeing  a  prescribed  minimum  signal  to  interfer¬ 
ence  plus  noise  ratio  (SINR)  at  each  receiver  is  nonconvex  and  NP- 
hard,  we  present  convex  SDP  relaxations  of  this  problem  and  study 
when  such  relaxations  are  tight.  Our  results  show  that  when  the 
steering  vectors  for  all  receivers  are  of  Vandermonde  type  (such  as 
in  the  case  of  a  uniform  linear  array  and  line-of-sight  propagation), 
a  globally  optimum  solution  to  the  corresponding  transmit  beam¬ 
forming  problem  can  be  obtained  via  an  equivalent  SDP  reformula¬ 
tion.  We  also  present  various  robust  formulations  for  the  problem 
of  single-group  multicasting,  when  the  steering  vectors  are  only  ap¬ 
proximately  known.  Simulation  results  are  presented  to  illustrate  the 
effectiveness  of  our  SDP  relaxations  and  reformulations. 


1.  INTRODUCTION 

Consider  a  downlink  transmission  scenario  where  the  transmitter  is 
equipped  with  N  antennas  and  there  are  M  receivers.  Let  h,  denote 
the  N  x  1  complex  channel  vector  from  each  transmit  antenna  to 
the  single  receive  antenna  of  user  i  £  {1, . . .  ,  M}.  Let  there  be  a 
total  of  1  <  G  <  M  multicast  groups,  {Qi,  ■  ■  .  ,  (?g},  where  Gk 
is  the  index  set  for  receivers  participating  in  multicast  group  k,  and 
k  £  { 1, . . .  ,  G}.  Assume  that  Qk  n  Qi  =  0,  l  /  k,  U kQk  = 
{1, . . .  ,  M},  and,  denoting  Gk  :=  \Qk\,  Ylk=i  G &  =  M- 

Let  wjf  denote  the  beamforming  weight  vector  applied  to  the  N 
transmitting  antenna  elements  to  transmit  multicast  stream  k.  The 
signal  transmitted  by  the  antenna  array  is  equal  to  Ylk= l  wk  sk(t), 
where  sk(t)  is  the  temporal  information-bearing  signal  directed  to 
receivers  in  multicast  group  k.  This  setup  includes  the  case  of  broad¬ 
casting  ( G  =  1)  [6],  and  the  case  of  individual  user  transmissions 
( G  =  M)  [2])  as  special  cases.  If  each  sk(t)  is  zero-mean  white 
with  unit  variance,  and  the  waveforms  {sfe(f)}^=1  are  mutually  un¬ 
correlated,  then  the  total  power  radiated  is  equal  to  Nfk=i  I  lwfc|  |i- 
The  joint  design  of  transmit  beamformers  subject  to  received 
SINR  constraints  can  then  be  posed  as  follows: 
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V  : 


min  X  | 


II 2 

Wfc||2 


s.t.  : 


2+<rf 


>  Ci ,  Vf  £  f/fc,  V/c  £  {1, . . .  ,  G}. 


Problem  V  was  considered  in  [5]  and  it  was  found  to  be  NP-hard, 
in  the  case  of  general  steering  vectors,  based  on  arguments  proved 
in  earlier  work  [6],  Therefore,  a  two  step  approach  was  proposed 
and  shown  to  yield  high-quality  approximate  solutions  at  manage¬ 
able  complexity  cost.  Specifically,  in  the  first  step,  the  original  non¬ 
convex  quadratically  constrained  quadratic  programming  (QCQP) 
problem  V  is  relaxed  to  a  semidefinite  program  (SDP)  (denoted  as 
1Z),  by  changing  the  optimization  variables  to  X.k  :=  and 

dropping  the  associated  non-convex  constraints  (rank(Xfc)  =  l}(?=i 
In  the  second  step,  a  randomization  procedure  is  employed  to  gen¬ 
erate  candidate  beamforming  vectors  from  the  solution  of  1Z.  For 
each  candidate  set  of  vectors,  a  multi-group  power  control  (A 4GVC) 
linear  programming  (LP)  problem  is  solved  to  ensure  that  the  con¬ 
straints  of  the  original  problem  V  are  met.  The  final  solution  of  this 
algorithm  is  the  set  of  beamforming  vectors  yielding  the  smallest 
MGVC  objective.  The  overall  complexity  of  the  algorithm  is  man¬ 
ageable,  since  the  SDP  and  LP  problems  can  be  solved  efficiently 
using  interior  point  methods  and  the  randomization  procedure  is  de¬ 
signed  so  that  its  computational  cost  is  negligible  compared  to  the 
aforementioned  problems. 


2.  EXACT  GLOBALLY  OPTIMAL  SOLUTION  IN  THE 
VANDERMONDE  CASE 

When  a  uniform  linear  array  (ULA)  is  used  for  far-field  transmit 
beamforming,  the  JV  x  1  complex  vectors  which  model  the  phase 
shift  from  each  transmit  antenna  to  the  receive  antenna  of  user  i  £ 
{1,...  ,  M}  are  Vandermonde  hi  =  [1  e-'2e*  e^N~1',ei]T . 

In  this  scenario,  we  observed  that  when  the  relaxed  SDP  problem  1Z 
in  [5]  is  feasible,  its  optimal  solution,  i.e.,  the  blocks  {X^pt}®=1,  are 
all  consistently  rank-one.  This  means  that  problem  1Z  is  then  equiv¬ 
alent  to,  and  not  a  relaxation  of,  the  original  problem  V .  Thus,  the 
second  step  of  the  proposed  algorithm,  comprising  the  randomiza¬ 
tion  -  multicast  power  control  loop,  turns  out  being  redundant  and  the 
set  of  the  optimum  beamforming  vectors  {w£pt}j?=1  can  be  formed 
simply  using  the  principal  components  of  the  blocks  {X^pt}j?=1. 
This  observation  suggests  that,  in  the  case  of  Vandermonde  chan¬ 
nel  vectors,  the  original  problem  V  is  no  longer  NP-hard  and  can  be 
equivalently  posed  as  a  convex  optimization  problem. 

Towards  this  end,  note  that  for  the  special  case  of  Vandermonde 
steering  vectors,  the  signal  power  received  at  each  user  can  be  rewrit- 
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ten  as 


wfhi  =  y  rk{i)ej8ie,  (1) 

£=-(N- 1) 

where  l  :=  n  -  m  and  rk(£)  :  =  wk{m)wt{m  + 

£).  Let  us  consider  rk{£)  for  0  <  i  <  TV  —  1,  i.e.,  rk(£)  = 
wk(m)wl(m  +  £).  Then  r*k(-l )  =  rk(£).  i.e.,  rk(£)  is 
conjugate-symmetric  about  the  origin.  Define  the  (27V  —  1)  x  1 
vector 

rk  '■=  [rk(—N  +  1),  •  •  •  , rfc(— 1), rfc(O), rfc(l),  •  •  •  ,rfc(7V+ 1)]T, 

(2) 

and  the  associated  (27V  —  1)  x  1  “extended"  steering  vector 
fi  :=  ■■■  ,  e~i9*,  1,  e>\  •••  (3) 

Then  wk  h,  =  ffrk.  Furthermore,  note  that  rk( 0)  =  rk(N)  = 
wk{tn)wk(rn)  =  | |wfc |||.  It  therefore  follows  that  the  orig¬ 
inal  problem  V  can  be  equivalently  written  as  follows 


min  Y  rfe(7V) 

h. 


s.t.  :  f,T  rk  >  Ci  if  re  +  Cjof ,  Vi  €  Gk,  Vk  £  {1, . . .  ,  G} 

t^k 

rk  :  autocorrelation  vector,  Vfe  £  {1, . . .  ,  G}  , 

where  the  fact  that  the  terms  in  the  denominator  are  all  non-negative 
has  also  been  taken  into  account. 

This  is  a  problem  comprising  a  linear  cost,  M  linear  inequal¬ 
ity  constraints,  and  autocon'elation  constraints.  Each  of  the  latter  is 
equivalent  to  a  linear  matrix  inequality  (LMI)  constraint  [  1 J .  Specif¬ 
ically,  rk(m),  Vm£  { — TV  + 1 , . . .  ,7V— 1}  belongs  to  the  set  of  fi¬ 
nite  autocorrelation  sequences  if  and  only  if  rk  (m)  =  trace(Em  Yfc), 
Vm  £  {— TV  + 1, . . .  ,7V— 1},  for  some  positive  semidefinite  matrix 
Yfc  £  CArxJV,  where  E  is  the  TV  x  TV  unit-shift  matrix  with  ones  in 
the  first  lower  sub-diagonal  and  zeros  elsewhere. 

Thus,  introducing  G  positive  semidefinite  TV  x  TV  “slack”  ma¬ 
trices,  one  for  each  autocorrelation  vector  rk,  the  autocorrelation 
constraints  are  equivalently  converted  to  linear  equality  constraints 
plus  positive  semidefinite  constraints  as  follows 

~1T: 

G 

min  Y  rk(N) 

frfe>fe=i>  OGTkU 

s.t. :  ff  rk  -  cz  J2t^k  f 7 re  >  Ciof, 

Viegk,  V*G{1,...,G}, 
r  k(m)  =  trace(EmYfe), 

Vttl  £  { — TV  T  1, . . ,  ,  TV  —  1},  VA;  G  {1, . . .  ,  G} 

Yfc  >:  o,  Vk  e  {!,...  , G}. 


Problem  V  is  an  SDP  problem  which  can  be  efficiently  solved 
by  any  standard  SDP  solver,  such  as  SeDuMi  [7],  by  means  of  in¬ 
terior  point  methods.  Once  the  optimum  autocorrelation  sequences 
{ffc^lfc-r  are  f°und.  they  can  be  factored  to  obtain  the  respective 

optimum  beamforming  vectors  {w£pt}^  ,  using  spectral  factor¬ 
ization  techniques  [9], 


A  simple  simulation  experiment  illustrates  the  equivalence  of 
the  aforementioned  algorithm  to  the  one  proposed  in  [5],  Figures 
1  and  2  show  the  optimized  transmit  beam  patterns  generated  by 
algorithm  1  (SDP  relaxation  problem  K  and  randomization  -  mul¬ 
ticast  power  control  problem  A iQVC)  and  algorithm  2  (SDP  prob¬ 
lem  V  and  spectral  factorization),  respectively.  The  ULA  consists  of 
TV  =  4  transmit  antenna  elements  spaced  A/2  apart.  The  M  =  24 
users  are  considered  evenly  clustered  in  G  =  2  groups,  at  an  angle 
of  0.5  degrees  to  their  neighboring  ones.  The  angular  cluster  separa¬ 
tion  (defined  as  the  minimum  angle  between  any  2  users  belonging 
to  different  groups)  is  set  to  10  degrees.  The  received  SINR  con¬ 
straints  are  set  to  lOdB  for  all  users  and  the  noise  variance  to  a2  =  1 
for  all  channels. 

3.  ROBUST  RELAXATION  OF  SINGLE-GROUP 
MULTICAST  BEAMFORMING 

In  this  section  we  provide  a  robust  relaxation  to  the  problem  of 
downlink  transmit  beamforming  towards  a  single  multicast  group, 
which  was  considered  in  [6].  The  key  difference  here  is  that  full 
channel  state  information  (CSI)  is  no  longer  available;  instead,  the 
channel  vectors  are  assumed  to  lie  in  a  ball  with  known  center  and 
radius.  Specifically,  letting  h;  :=  h,/^/c,cr/  denote  the  normal¬ 
ized  channel  vectors,  we  assume  that  h;  £  Be(hi)  :=  {h,|hi  = 
h;  +  e,  || e ||  <  e}.  The  robust  design  of  the  beamformer  that  min¬ 
imizes  the  transmitted  power,  subject  to  constraints  on  the  received 
SNR  can  be  written  as 


KB: 

II  ll  2 

mm  w  L 

wee* 

s.t. :  |wHh;|2  >  1,  V  hi  £  Be(hi),  Vi  G  {1, ...  , M}. 


The  constraints  in  problem  KB  guarantee  that  the  received  signal 
power  in  all  M  users  will  be  larger  than  unity  in  the  worst  case ,  i.e. 
for  the  particular  channel  vector  hi  that  corresponds  to  the  smallest 
value  of  |wwh,  |2.  Each  one  of  these  constraints  is  equivalent  to  the 
semi-infinite  nonconvex  constraint 

|wffhi|  >  1,  V  hi  G  Be(hi),  (4) 

which  admits  a  convex  (SOC)  reformulation,  as  it  was  shown  in  [8]. 
First  note  that  equation  (4)  can  be  equivalently  written  as 

min  |wwhi|  >  1.  (5) 

hi6Be(h4) 


Under  the  natural  constraint  |wHhi|  >  e || w || 2,  it  can  be  shown  [8] 
that 

min  Iw^hij  =  Iw^hil  —  e||  w  ||  2 ,  (6) 

hiSBe(hi) 

and  we  can  recast  equation  (5)  as 

Iw^hij  —  e|| w|| 2  >  1  <£4  Iw^hil  >  1  +  e||w||2.  (7) 

The  robust  beamforming  problem  KB  is  thus  equivalently  writ¬ 
ten  as 


1ZB'  : 

II  ||  2 

mm  w  2 

wecN 

s.t. : 

wffhi  >  1  +  E  w  2,  V  i  £  {1, . . 

•  ,M}. 
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Let  us  also  consider  the  corresponding  original  non-robust  beam-  We  will  show  that  w'  is  also  a  solution  of  TZB' .  Since  w'  is  a 

forming  (ONRB)  problem:  solution  of  MNRB,  it  follows  that 


•  ||  ||  2 

mm  || w|| 2 

w6Cn 


|w'ffhi|  > 


1 

1  -  e||  w0 1| 2  ‘ 


(10) 


s.t.  :  |wwhj|  >1,  Vi  £  {1, ...  ,  M}. 

Our  main  result  in  this  section  is  the  following: 

Claim  1  Let  w'  be  an  exact  solution  of  1ZB' .  Then  w,/(l  +  e||w/||) 
is  an  exact  solution  of  ONRB.  Conversely,  if  w0  is  an  exact  solution 
of  ONRB,  then  w0/(l  —  e||w0||)  is  an  exact  solution  of  TZB' . 


However,  from  (9),  it  follows  (provided  that  1  —  e || w0 1| 2  >  0,  i.e., 


e  < 


—  I  wo  II 2 


)  that 


w  2  = 


w0  2 


1  -  e  w0 


<=>  W0  2  = 


W  2 


1  +  e  w' 


Hence 


Proof:  Forward:  The  proof  is  based  on  two  Lemmas.  The  first  is  the 
following  Scaling  Lemma'. 

Lemma  1  w„  is  an  exact  solution  of  ONRB  if  and  only  iftw0  is  an 
exact  solution  of 

11  11 2 

mm  w  L 

w6Cw 


s.t.  :  |wHh»|  >  t,  Vi  £  {1, ...  ,  M}. 

Proof:  |w^hi|  >  1  =>■  |fw^hj|  >  t.  Suppose  there  exists  wi 
with  |w^hi|  >  t,  Vi,  and  || wi || 2  <  i2 1| w0 1| | .  Consider  W2  := 
wi/f.  It  satisfies  Iw^hiJ  >  1,  and 

|| w2 1|1  =  1| wi |||  <  ^f2||w0||!  =  || w0 1|1,  (8) 

which  contradicts  optimality  of  w0  for  ONRB.  The  converse  is  ob¬ 
vious.  □ 

Lemma  2  Let  w'  be  an  exact  solution  of  TZB' .  Then,  w '  is  an  exact 
solution  of  the  following  non-robust  beamforming  problem  (NRB) 

11  11 2 

mm  w  2 

wee" 

s.t.  :  |wHh;|  >  1  +  e||w'||2,  Vi  £  {1, . . .  ,  M}. 

Proof:  Clearly,  w'  is  a  feasible  solution  of  NRB.  since  it  satisfies 
the  constraints.  Suppose  there  exists  w"  that  also  satisfies  the  con¬ 
straints  of  NRB,  but  with  || w// 1| 2  <  || w/ H^.  Then  1  +  e||  w' || 2  > 
1  +  e||w"  ||  2,  and  thus  w"  also  satisfies  the  constraints  of  problem 
TVS' ,  with  ||  w"  || 2  <  || w'  || 2 .  This  contradicts  optimality  of  w'  for 
TIB' .  □ 

Now  suppose  that  w'  is  an  exact  solution  of  TIB' .  It  follows 
from  the  last  Lemma  that  it  is  also  an  exact  solution  of  NRB.  Then, 
from  the  Scaling  Lemma,  it  follows  that  w'/(l  +  e||  w'  || )  is  an  exact 
solution  of  ONRB.  This  completes  the  forward  part  of  the  proof  of 
Claim  1.  □ 

Converse:  Let  Wo  be  a  solution  of  ONRB.  Then,  according  to 
the  Scaling  Lemma 


1  -  e  w0 


(9) 


is  a  solution  of  the  modified  NRB  (MNRB)  problem 


mm  || w|| 2 

w6Cn 


s.t.  : 


Iw^hd  > 


1  -  e||w0| 


Vi  €  {1,...  ,M}. 


1  -  e  w0  | 


1  - 


=  1  +  e||  w/ 1| 


(ID 


l+e||w'  || 2 


so  w'  indeed  satisfies  the  constraints  of  TVS' .  Suppose  there  exists 
w",  such  that  ||  w" || 2  <  ||  w' || 2  which  also  satisfies  the  constraints 
of  TIB' .  Front  the  forward  proof  it  follows  that  .  .  V  satisfies 

1  l+e||w  ||  2 

the  constraints  of  ONRB,  with  norm  ,  ^  '!?,  .  On  the  other  hand, 

l  +  e||w"||2 

w o  in  (9)  is  an  exact  solution  of  ONRB,  and  ||w' II2  =  , 

v  11  11  i— e  II  wo  II 2 

yielding  ||  wQ || 2  =  1^JJ/2||2  ■  But  is  monotone  increasing  in 

x  >  0.  Therefore,  ||w"||  <  ||w'||  implies  that 


w  2 


1  +  e  w" 


1  +  e  w' 


=  2: 


(12) 


which  contradicts  optimality  of  w0  for  ONRB.  Thus,  the  proof  of 
Claim  1  is  complete.  □ 

Claim  1  implies  that  we  can  derive  an  exact  solution  of  the  ro¬ 
bust  beamforming  problem  7 ZB'  by  a  simple  scaling  of  a  solution  to 
ONRB.  Since  both  problems  are  NP-hard  in  general,  in  practice  this 
translates  to  the  following  algorithm: 

1.  Compute  a  good  feasible  solution  w0  for  ONRB  using  the 
SDP  relaxation  approach  in  [6]. 

2 .  A  good  feasible  solution  of7?.13,isthenw0/(l  —  e||w0||2). 

Letting  c0  and  c'  denote  the  norms  of  the  optimal  solutions  of  ONRB 
and  TIB' ,  respectively,  we  also  have 


Co 


1  +  td 


4=r  c  = 


(13) 


Claim  1  further  suggests  that  if  we  set  e  >  l/||w0||2,  then  the  robust 
problem  would  be  infeasible. 


4.  EXACT  ROBUST  SOLUTION  IN  THE  SINGLE-GROUP 
VANDERMONDE  CASE 

Let  us  consider  again  the  case  when  the  steering  vectors  are  Vander¬ 
monde.  Then,  the  single-group  ( G  —  1)  version  of  problem  V  can 
be  written  as 


VI  : 

T 

min  ej  r 

rglxC*'"1 


s.t. :  Relh^Ir]  >  c;of,  V  i  £  {1, ...  ,  M}, 

n  =  trace(E^Y),  W  £  {0, . . .  ,  TV  —  1}, 

Y  A  0. 


V  -  975 


where  ei  is  the  first  column  of  the  N  x  N  identity  matrix, 

N-i 

re  =  w^Wm+e,  W  £  {0, . . .  ,  N  -  1},  (14) 

m= 1 

r  =  [r0  n  ■  ■  ■  rjv_i]T  €  K  x  C^-1,  (15) 

and 


A  robust  extension  of  the  problem  VI  would  be  to  ask  that  the 
SNR  constraints  are  still  met,  when  the  angles  {#;  are  not  known 
exactly,  but  allowing  an  estimation  error  up  to  A,  i.e.,  they  are  as¬ 
sumed  to  lie  within  the  intervals  9i  £  [9%  —  A,  9,  +  A] .  In  such 
scenario,  the  SNR  constraints  are  defined  as 

Re[hflr]  >  cicr?,  V  i  £  {1, . . .  ,M},  V<9;  £  [9i  -  A,9t  +  A], 

(17) 

An  interpretation  of  these  constraints  is  that  they  require  (the  real 
part  of)  certain  trigonometric  polynomials  to  be  nonnegative  over  a 
segment  of  the  unit  circle.  As  it  is  shown  in  [4],  constraints  of  this 
form  can  be  equivalently  reformulated  to  the  LMI  constraints 

Ir -(cio?  +rtOer  =  L*(Xi)  +  A*(Zi;ft-  A.fc  +  A),  (18) 

V i  £  {1, . . .  ,  M},  where  Xi  €  CNxN  >  0,  Z;  <=  C(JV_1)x(JV“1)  y 

0,  £  R  is  unconstrained,  and  the  linear  operators  L*  and  A*  are 

defined  by  equations  (35)  and  (36)(along  with  (16))  in  [4J,  respec¬ 
tively.  Hence,  the  problem  encountered  in  this  section  is  an  SDP 
problem,  since  it  consists  of  a  linear  cost,  MN  linear  equality  con¬ 
straints  and  2 M  positive  semidefinite  constraints. 

5.  CONCLUSIONS 

Whereas  multi-group  multicast  transmit  beamforming  under  SINR 
constraints  is  NP-hard  in  general  [5,  6],  we  have  shown  that,  in  the 
special  case  of  Vandermonde  steering  vectors  it  is  in  fact  a  semidef¬ 
inite  problem,  which  can  be  efficiently  solved.  We  have  also  con¬ 
sidered  robust  beamforming  solutions  under  channel  uncertainty  for 
the  case  of  a  single  multicast  group.  For  general  steering  vectors,  we 
have  shown  that  exact  solutions  of  the  robust  and  non-robust  versions 
of  the  problem  are  related  via  a  simple  one-to-one  scaling  transfor¬ 
mation.  Since  both  problems  are  NP-hard,  this  suggests  an  algorithm 
to  generate  a  quasi-optimal  solution  for  one  given  a  quasi-optimal 
solution  for  the  other.  In  the  important  special  case  of  Vandermonde 
steering  vectors,  we  have  shown  that  the  robust  version  of  the  prob¬ 
lem  is  convex  as  well.  This  robust  solution  can  be  extended  to  the 
multi-group  Vandermonde  case. 


€  R^.  (16) 
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Algorithm  1 :  SDR  +  Randomization  +  MGPC 


24  users  in  2  groups,  spaced  1 0  deg  apart 


Fig.  1.  SDP  Relaxation  +  Randomization  result  for  ULA,  N  =  4, 
M  =  2  x  12,  SINR  =  lOdB 


Algorithm  2:  SDP  +  Spectral  factorization 


24  users  in  2  groups,  spaced  1 0  deg  apart 


Fig.  2.  Exact  SDP  +  Spectral  Factorization  result  for  ULA,  N  =  4, 
M  =  2  x  12,  SINR  =  lOdB 
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ABSTRACT 

The  problem  of  transmit  beamforming  to  multiple  co-channel  mul¬ 
ticast  groups  is  considered,  from  the  viewpoint  of  guaranteing  a 
prescribed  minimum  signal-to-interference-plus-noise-ratio  (SINR) 
at  each  receiver.  The  problem  is  a  multicast  generalization  of  the 
SINR-constrained  multiuser  downlink  beamforming  problem:  the 
difference  is  that  each  transmitted  stream  is  directed  to  multiple  re¬ 
ceivers,  each  with  its  own  channel.  Such  generalization  is  relevant 
and  timely,  e.g.,  in  the  context  of  802.16  wireless  networks.  Based 
on  earlier  results  for  a  single  multicast  group,  the  joint  problem 
is  easily  shown  to  be  NP-hard,  a  fact  that  motivates  the  pursuit  of 
quasi-optimal  computationally  efficient  solutions.  It  is  shown  that 
Lagrangian  relaxation  coupled  with  a  randomization  /  co-channel 
multicast  power  control  loop  yields  a  computationally  efficient 
high-quality  approximate  solution.  For  a  significant  fraction  of 
problem  instances,  the  solutions  generated  this  way  are  exactly 
optimal.  Carefully  designed  and  extensive  simulation  results  are 
presented  to  support  the  main  findings. 


The  joint  design  of  transmit  beamformers  can  then  be  posed 
as  the  problem  of  minimizing  the  total  radiated  power  subject  to 
meeting  prescribed  SINR  constraints  d  at  each  of  the  M  receivers 


1 : 


s.t.  : 


G 

min  V 

{w*6c*y=i^! 


II 2 

Wfe||2 


E 


l^k 


|w"  hi  \  2+<t< 


>  Ci,  VI  £  Qk,  Vk  £  ,  G}. 


Problem  X  contains  the  associated  broadcasting  problem  as  a  spe¬ 
cial  case;  from  this  and  [6],  it  immediately  follows  that 

Claim  1  Problem  X  is  NP-hard. 

This  motivates  (cf.  [4])  the  pursuit  of  sensible  approximate  solu¬ 
tions  to  problem  X. 


2.  RELAXATION 


1.  DATA  MODEL  AND  PROBLEM  STATEMENT 

Consider  a  wireless  scenario  incorporating  a  single  transmitter  with 
N  antenna  elements  and  M  receivers,  each  with  a  single  antenna. 
Let  hi  denote  the  N  x  1  complex  vector  that  models  the  propa¬ 
gation  loss  and  phase  shift  of  the  frequency-flat  quasi-static  chan¬ 
nel  from  each  transmit  antenna  to  the  receive  antenna  of  user  i  £ 
{ 1 , . . .  ,  M}.  Let  there  be  a  total  of  1  <  G  <  M  multicast  groups, 
{Q i, . . .  ,  Qg},  where  Qk  contains  the  indices  of  receivers  partic¬ 
ipating  in  multicast  group  fc,  and  k  £  {1, . . .  ,  G}.  Each  receiver 
listens  to  a  single  multicast;  thus  Qk  (~l  Qi  =  0,  l  ^  fc,  U kQk  = 
{1, . . .  ,  M},  and.  denoting  Gk  :=  \Qk\,  J2k=i  =  M. 

Let  w k  denote  the  beamforming  weight  vector  applied  to  the 
N  transmitting  antenna  elements  to  generate  the  spatial  channel 
for  transmitting  to  group  k.  Then  the  signal  transmitted  by  the 
antenna  array  is  equal  to  wk  sk(t),  where  Sk(t)  is  the  tem¬ 

poral  information-bearing  signal  directed  to  receivers  in  multicast 
group  k.  Note  that  the  above  setup  includes  the  case  of  broadcast¬ 
ing  (a  single  multicast  group,  G  =  1)  [6],  as  well  as  the  case  of 
individual  information  transmission  to  each  receiver  (G  =  M)  by 
means  of  spatial  multiplexing  (see,  e.g.,  [1]).  If  each  Sk(t)  is  zero- 
mean  white  with  unit  variance,  and  the  waveforms  {sfc(t)}fc=1  are 
mutually  uncorrelated,  then  the  total  power  radiated  by  the  trans¬ 
mitting  antenna  array  is  equal  to  57tLi  ||wfc|||. 


‘Supported  in  part  by  the  U.S.  ARO  under  ERO  Contract  N62558-03- 
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Towards  this  end,  define  Q ,  :=  li , h 7  and  X*;  :=  WfcWjf,  and 
note  that  jw^hil2  =  hf^w^w^hi  =  trace(hfrwfcw(fh!)  = 
trace(hihf/Wfcw|f)  =  trace(Q;Xfc).  Then,  problem  X  can  be 
equivalently  reformulated  as 

G 

min  y  trace (Xfe) 

{X^C^xtv}^  ^ 

s.t.  :  trace(QiXfc)  >  a  y  trace(Q,X;)  +  c;of , 

l^k 

Vi  £  Qk,  Vfc  €  {1,...  ,G}, 

Xfe  >r0,  Vfc  €  {1,...  , G}, 
rank(Xfe)  =  1,  Vfc  €  {1, . . .  ,  G}, 

where  the  fact  that  the  terms  in  the  denominator  are  all  non-negative 
has  also  been  taken  into  account.  Dropping  the  rank-one  con¬ 
straints,  we  arrive  at  the  following  relaxation  of  problem  X 

~W: 

G 

min  y  trace(Xfc) 

{xfcec"xiv}G=i>  {sieK}",  k=1 

s.t.  :  trace(QiXfe)  -  d  y  tracejQiX;)  -  st  =  dal, 

l^k 

Vi  £  Qk,  Vk  £{!,...  ,  G}, 

Si  >  0,  Vi  £  {1, ...  ,  M}, 

_ Xfc>r0,  Vk  £{!,...  ,G}, _ 
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where  M  non-negative  real  “slack”  variables  s*  have  been  intro¬ 
duced,  in  order  to  convert  the  inequality  constraints  to  equality 
constraints,  plus  non-negativity  constraints.  Problem  TZ  is  a  Semi- 
Definite  Program  (SDP),  expressed  in  the  primal  standard  form 
used  by  SDP  solvers,  such  as  SeDuMi  [7],  SeDuMi  uses  interior 
point  methods  to  solve  efficiently  such  SDP  problems,  at  a  com¬ 
plexity  cost  that  is  at  most  0((GN2  +  JV/)3'5),  and  usually  much 
less. 


3.  OBTAINING  AN  APPROXIMATE  SOLUTION  TO 
PROBLEM  X 

Problem  X  may  not  admit  a  feasible  solution  (counter-examples 
may  be  easily  constructed),  but  if  it  does,  the  aforementioned  ap¬ 
proach  will  yield  a  solution  to  problem  TZ.  Due  to  relaxation,  this 
solution  will  not,  in  general,  consist  of  rank-one  blocks.  In  or¬ 
der  to  obtain  a  high-quality  approximate  solution  of  problem  X , 
the  concept  of  randomization  can  be  employed  to  generate  can¬ 
didate  beamforming  vectors  in  the  span  of  the  respective  transmit 
covariance  matrices;  see,  for  example,  [6].  The  main  difference 
relative  to  the  simpler  broadcast  case  (G  =  1)  considered  in  [6],  is 
that  here  we  cannot  simply  “scale  up”  the  candidate  beamforming 
vectors  generated  during  randomization  to  satisfy  the  hard  con¬ 
straints  of  problem  X.  The  reason  is  that,  in  contrast  to  [6],  we 
herein  deal  with  an  interference  scenario,  and  boosting  one  group's 
beamforming  vector  also  increases  interference  to  nodes  in  other 
groups.  Whether  it  is  feasible  to  satisfy  the  constraints  for  a  given 
set  of  candidate  beamforming  vectors  is  also  an  issue  here.  To¬ 
wards  resolving  this  situation,  let  a,k,i  :=  |w^hi  |2  denote  the  sig¬ 
nal  power  received  at  receiver  i  from  the  stream  directed  towards 
users  in  multicast  group  fc.  Let  /3k  :=  |jwfc||2,  and  pk  denote 
the  power  boost  factor  for  multicast  group  fc.  Then  the  following 
Multi-Group  Power  Control  (MGVC)  problem  emerges  in  con¬ 
verting  candidate  beamforming  vectors  to  a  candidate  solution  of 
problem  X 


MGVC  : 


s.t.  : 


„  V  PkPk 


mm 

re 


Pl<*l,i+cr\ 


>  Cj 


Vte&,  Vfce{i,...  ,G}, 

Pk  >  o,  vfc  6  {l, . . . ,  csy. 


As  in  Section  2,  taking  advantage  of  the  fact  that  the  terms  in  the 
denominator  are  all  non-negative  and  introducing  M  non-negative 
real  “slack”  variables  s;,  problem  MGVC  can  be  reformulated  as 


MGVC  : 


s.t.  : 


min  V"'  (3kPk 

{Pk€*}fe=1.  DiSK}"!  J 


Pktlk.i  e.-i  y  ^  Pl&l.i  Si  —  e.j(7j  , 
Izfik 

VieGk,  Vfc€{l,...  ,G}, 

Pk  X  0,  Vfc  €  {1,...  , G}. 

Si  >0,Vi€  {!,  .  ..  ,  M}, 


Problem  M  GVC  is  a  Linear  Program  (LP),  since  the  cost  function 
and  all  constraints  are  linear.  SeDuMi  can  be  used  again  to  solve 
it  efficiently.  Note  that  SeDuMi  will  also  yield  an  infeasibility  cer¬ 
tificate  in  case  the  MGVC  problem  is  not  solvable  for  a  particular 
beamforming  configuration,  which  is  nice. 

For  G  =  M  (independent  information  transmission  to  each 
receiver),  problem  TZ  is  equivalent  to  and  not  a  relaxation  of  X, 
see  [  1  ] ,  and  problem  MGVC  reduces  to  the  well-known  multiuser 
downlink  power  control  problem,  which  can  be  solved  using  sim¬ 
pler  means  (e.g.,  [3]):  matrix  inversion,  but  also  iterative  descent 
algorithms.  In  this  special  case,  (in (feasibility  can  be  determined 
from  the  spectral  radius  of  a  certain  "connectivity”  matrix.  Simi¬ 
lar  simplifications  for  the  general  instance  of  MGVC  are  perhaps 
possible,  but  appear  highly  non-trivial.  At  any  rate,  LP  routines 
are  very  efficient. 

The  overall  algorithm  for  obtaining  an  approximate  solution 
to  problem  X  can  thus  be  summarized  as  follows: 

1.  Relaxation:  Solve  problem  TZ,  using  SDP.  Denote  the  so¬ 
lution  {Xfc}°*, 

2.  Randomization  /  Scaling  Loop:  For  each  fc,  generate  a 
vector  in  the  span  of  Xfc,  using  the  Gaussian  randomization 
technique  (randC)  in  [6],  If,  for  some  fc,  rank(Xfc)  =  1, 
then  use  the  principal  component  instead.  Next,  feed  the 
resulting  set  of  candidate  beamforming  vectors  {wfc}®=1 
into  problem  MGVC  and  solve  it  using  LP.  If  the  particu¬ 
lar  instance  of  MGVC  is  infeasible,  discard  the  proposed 
set  of  candidate  beamforming  vectors;  else,  see  if  it  yields 
smaller  MGVC  objective  than  previously  checked  candi¬ 
dates.  If  so,  record  solution  and  associated  objective  value. 

The  quality  of  approximate  solutions  to  problem  X  generated 
this  way  can  be  checked  against  the  lower  bound  on  transmit  power 
obtained  in  solving  problem  TZ.  This  bound  can  be  further  moti¬ 
vated  from  a  duality  perspective,  as  in  [6];  that  is,  the  aforemen¬ 
tioned  relaxation  lower  bound  is  in  fact  the  tightest  lower  bound  on 
the  optimum  of  problem  X  attainable  via  Lagrangian  duality  [2], 
This  follows  from  arguments  in  [8]  (see  also  the  single-group  case 
in  [6]),  due  to  the  fact  that  problem  X  is  a  quadratically  constrained 
quadratic  program. 

4.  SIMULATION  RESULTS 

The  first  step  of  the  proposed  algorithm  consists  of  a  relaxation 
of  the  original  QoS  beamforming  problem  X  to  problem  TZ.  The 
original  problem  X  may  or  may  not  be  feasible;  if  it  is,  then  so 
is  problem  TZ.  If  TZ  is  infeasible,  then  so  is  X.  The  converse  is 
generally  not  true;  i.e.,  if  TZ  is  feasible,  X  need  not  be  feasible.  In 
order  to  establish  feasibility  of  X  in  this  case,  the  randomization 
-  MGVC  loop  should  yield  at  least  one  feasible  solution.  This 
is  most  often  the  case,  as  will  be  verified  in  the  sequel.  If  the 
randomization  -  MGVC  loop  fails  to  return  at  least  one  feasible 
solution,  then  the  (in)feasibility  of  X  cannot  be  determined.  There 
is,  therefore,  a  relatively  small  proportion  of  problem  instances  for 
which  (in)feasibility  of  X  cannot  be  decided  using  the  proposed 
approach. 

It  is  evident  from  the  above  discussion  that  feasibility  is  a  key 
aspect  of  problem  X  and  its  proposed  solution  via  problem  TZ  and 
the  randomization  -  MGVC  loop.  Feasibility  depends  on  a  num¬ 
ber  of  factors;  namely,  the  number  of  transmit  antenna  elements 
N ,  the  number  and  the  populations  of  the  multicast  groups,  G  and 
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Gfc  respectively,  the  channel  characteristics  hi,  the  channel  noise 
variances  erf,  and  finally  the  desired  receive  SINR  constraints  a. 

Beyond  feasibility,  there  are  two  key  issues  of  interest.  The 
first  has  to  do  with  cases  for  which  the  solution  to  problem  TZ 
yields  an  exact  optimum  of  the  original  problem  X.  This  happens 
when  the  N  x  N  blocks  Xfc,  k  £  { 1 ,  •  •  -  ,  G}  turn  out  all  being 
rank-one.  In  this  case,  the  associated  principal  components  solve 
optimally  the  original  problem  X,  i.e.,  in  such  a  case  TZ  is  not  a 
relaxation  after  all.1  The  second  issue  has  to  do  with  the  quality  of 
the  final  approximate  solution  to  problem  X  in  those  cases  where 
a  feasible  solution  can  be  found  using  the  proposed  two-step  algo¬ 
rithm.  As  in  [6],  a  practical  figure  of  merit  for  the  quality  of  the 
final  approximate  solution  (set  of  beamforming  vectors  and  power 
scaling  factors)  is  the  ratio  of  the  total  transmitted  power  corre¬ 
sponding  to  the  approximate  solution  over  trace(Xfc)  -  the 

lower  bound  generated  from  the  solution  of  7 Z. 

We  consider  the  standard  i.i.d.  Rayleigh  fading  model,  i.e.,  the 
elements  of  the  channel  vectors  hi,  Vi  £  {1, . . .  ,  M}  are  i.i.d. 
circularly  symmetric  complex  Gaussian  random  variables  of  vari¬ 
ance  1.  Tables  1  and  2  summarize  the  results  obtained  using  the 
proposed  algorithm  for  300  Monte-Carlo  runs2  and  1000  Gaussian 
randomization  samples  each.  The  simulations  are  repeated  for  a 
variety  of  choices  for  N,  M  (see  column  1).  The  users  are  con¬ 
sidered  to  be  evenly  distributed  among  the  multicast  groups,  i.e., 
Gfc  =  M/G,  Vfc  £  {1, . . .  ,  G}.  For  each  such  configuration,  the 
problem  is  solved  for  increasing  values  (in  dB,  column  2)  of  the 
received  SINR  constraints  (same  for  all  users),  until  problem  TZ 
becomes  infeasible.  The  noise  variance  is  set  to  a2  =  1  for  all 
channels.  The  percentage  of  the  300  Monte-Carlo  runs  for  which 
TZ  is  feasible  is  shown  in  column  3.  Columns  4  and  5  report  the 
percentage  of  TZ  feasible  solutions  which  yield  exact  solutions  to 
problem  X  (i.e.,  when  all  X^’s  are  rank-one),  and  for  which  the 
ensuing  randomization  -  MQVC  loop  yields  at  least  one  feasible 
solution,  respectively.  Finally,  the  last  column  holds  the  average 
value  of  the  ratio  of  transmitted  power  corresponding  to  the  final 
approximate  solution  over  the  lower  bound  obtained  from  the  SDR 
solution. 

The  TZ  feasibility  percentage,  and  the  percentage  of  cases  where 
TZ  is  equivalent  to  X ,  listed  in  columns  3  and  4,  are  also  plotted  in 
Figures  1  and  2,  versus  the  requested  SINR  values,  for  most  of  the 
scenarios  under  consideration.  It  is  observed  that  TZ  is  getting  more 
difficult  to  solve  (for  increasing  values  of  the  SINR  constraints)  as 
the  number  G  and/or  the  population  Gk  of  the  multicast  groups 
increases  and/or  the  number  N  of  available  transmit  antenna  ele¬ 
ments  decreases.  In  all  configurations  considered,  the  higher  the 
target  SINR,  the  less  likely  it  is  that  problem  TZ  is  feasible,  which 
is  intuitive.  Interestingly  though,  the  percentage  of  exact  solutions 
to  X  generated  via  TZ  also  increases  with  target  SINR.  It  seems  as 
if  rank-one  solutions  are  more  likely  when  operating  close  to  the 
infeasibility  boundary.  Furthermore,  if  the  same  number  of  users 
is  distributed  over  more  multicast  groups  (thus,  the  number  Gk  of 
users  per  group  drops)  the  attainable  common  SINR  is  reduced,  as 
is  perhaps  intuitive.  On  the  other  hand,  when  the  target  SINR  is 


'it  is  interesting  to  find  the  frequency  of  occurrence  of  such  an  event, 
whose  benefit  is  twofold:  not  only  the  problem  is  solved  optimally,  but 
also  at  smaller  complexity,  since  the  randomization  step  and  the  repeated 
solution  of  the  ensuing  JViQVC  problem  is  avoided. 

23000  Monte-Carlo  runs  were  employed  in  cases  where  7 Z  was  feasible 
in  less  than  10%  of  the  300  problem  instances  initially  considered.  This 
was  done  to  improve  the  estimation  accuracy  for  quantities  conditioned  on 
the  feasibility  of  7 Z. 


on  the  relatively  low  side,  optimum  solutions  are  more  frequently 
encountered  in  this  case  (e.g.  see  the  case  of  12  users  distributed  in 
2,  3,  and  4  groups  for  SINR  of  6dB),  since  it  is  more  likely  for  the 
fewer  users  of  any  group  to  be  spatially  close  (the  respective  prob¬ 
ability  is  approximately  1  / G°k ).  Last  but  not  least,  the  random¬ 
ization  -  MQVC  loop  yields  a  feasible  solution  with  a  probability 
higher  than  90%  in  most  cases  where  TZ  is  feasible;  this  solution 
entails  transmission  power  that  is  under  two  times  (3  dB  from)  the 
possibly  unattainable  lower  bound,  on  average. 

In  some  scenarios,  TZ  consistently  yields  an  exact  solution  of 
X.  That  is,  the  X^  blocks  are  all  consistently  rank-one.  In  this 
case,  no  further  randomization  is  needed  -  the  principal  compo¬ 
nents  of  the  extracted  blocks  are  the  optimal  beamformers.  More 
on  this  will  be  included  in  [5]. 

5.  CONCLUSIONS 

Transmit  beamformer  design  was  considered  in  the  context  of  co¬ 
channel  multicast  transmission  to  multiple  groups  of  users.  The 
problem  is  a  generalization  of  downlink  transmit  beamforming  of 
independent  information  streams  to  individual  users  ([1]  and  ref¬ 
erences  therein);  and  the  single-group  multicast  beamforming  in 

[6].  Using  [6],  the  general  instance  of  the  problem  is  easily  shown 
to  be  NP-hard.  A  two-step  approach  comprising  semidefinite  re¬ 
laxation  and  a  randomization  -  multicast  power  control  loop  was 
proposed  and  shown  to  yield  high-quality  approximate  solutions, 
plus  means  of  testing  feasibility,  at  manageable  complexity  cost. 
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Fig.  1.  7Z  feasibility  percentages 


Fig.  2.  7Z  equivalence  to  T  percentages, 


Table  1.  MC  simulation  results  for  QoS  Beamforming  (Rayleigh) 


N/G  x  Gk 

SINR 

7Z  % 

7 Z  =  X% 

% 

mean 

8/2  x  8 

6 

100 

9.33 

99.67 

1.57 

8/2  x  6 

6 

100 

34.33 

100 

1.17 

8/3  x  4 

6 

100 

76.67 

100 

1.04 

8/4  x  3 

6 

100 

92.67 

99.67 

1.01 

6/2  x  8 

6 

96.33 

13.49 

83.74 

2.74 

6/2  x  6 

6 

100 

37.67 

100 

1.39 

6/2  x  4 

6 

100 

84 

99.67 

1.02 

4/2  x  8 

6 

4.57 

35.77 

68.61 

1.86 

4/2  x  6 

6 

46.67 

48.57 

88.57 

1.64 

4/2  x  4 

6 

97.67 

74.40 

100 

1.07 

8/2  x  8 

8 

100 

13 

99.33 

1.85 

8/2  x  6 

8 

100 

34.67 

100 

1.16 

8/3  x  4 

8 

100 

79.67 

100 

1.04 

8/4  x  3 

8 

83 

95.18 

100 

1.01 

6/2  x  8 

8 

70.33 

21.33 

79.62 

2.05 

6/2  x  6 

8 

99.67 

38.80 

99.67 

1.26 

6/2  x  4 

8 

100 

83.33 

100 

1.02 

4/2  x  6 

8 

12.67 

60.53 

92.11 

2.24 

4/2  x  4 

8 

90 

80.37 

100 

1.05 

Table  2.  MC  simulation  results  for  QoS  Beamforming  (Rayleigh) 


N/GxGk  |  SINR  ||  7^%  \TZ=T%  \  MQVC  %  |  mean 


8/2  x  8 

10 

100 

13 

99.67 

1.92 

8/2  x  6 

10 

100 

37 

99.67 

1.17 

8/3  x  4 

10 

99 

80.81 

99.33 

1.04 

8/4  x  3 

10 

43.4 

97.31 

98.92 

1.00 

6/2  x  8 

10 

30.67 

36.96 

84.78 

1.64 

6/2  x  6 

10 

98 

44.90 

96.94 

1.46 

6/2  x  4 

10 

100 

82.67 

100 

1.02 

4/2  x  6 

10 

1.97 

74.58 

93.22 

1.39 

4/2  x  4 

10 

74 

82.43 

99.10 

1.04 

8/2  x  8 

12 

97.67 

17.41 

96.93 

1.75 

8/2  x  6 

12 

100 

37.33 

100 

1.15 

8/3  x  4 

12 

91.67 

87.64 

100 

1.04 

8/4  x  3 

12 

11.73 

97.44 

99.72 

1.00 

6/2  x  8 

12 

5.1 

49.02 

84.31 

1.99 

6/2  x  6 

12 

86.33 

52.51 

98.07 

1.37 

6/2  x  4 

12 

100 

86 

99 

1.02 

4/2  x  4 

12 

51.33 

86.36 

99.35 

1.14 

8/2  x  8 

14 

90.33 

32.84 

95.94 

2.11 

8/2  x  6 

14 

100 

40.67 

100 

1.13 

8/3  x  4 

14 

73.33 

92.27 

100 

1.04 

8/4  x  3 

14 

1.93 

96.55 

100 

1.10 

6/2  x  6 

14 

68.67 

64.08 

97.09 

1.21 

6/2  x  4 

14 

100 

87 

100 

1.01 

4/2  x  4 

14 

32.33 

90.72 

97.94 

1.04 

8/2  x  8 

16 

70.67 

48.11 

95.28 

1.63 

8/2  x  6 

16 

100 

48 

100 

1.11 

8/3  x  4 

16 

51.33 

92.86 

100 

1.03 

6/2  x  6 

16 

49 

68.71 

92.28 

1.15 

6/2  x  4 

16 

100 

88.33 

99.33 

1.01 

4/2  x  4 

16 

18.33 

90.91 

100 

1.01 

8/2  x  8 

18 

48.67 

57.53 

94.52 

1.28 

8/2  x  6 

18 

100 

55 

100 

1.10 

8/3  x  4 

18 

31 

93.55 

100 

1.02 

6/2  x  6 

18 

33.67 

79.21 

98.02 

1.13 

6/2  x  4 

18 

100 

87.67 

99.33 

1.01 

4/2  x  4 

18 

8.53 

95.70 

98.83 

1.02 

8/2  x  8 

20 

30 

64.44 

97.78 

1.29 

8/2  x  6 

20 

100 

57.33 

100 

1.08 

8/3  x  4 

20 

19 

92.98 

98.25 

1.01 

6/2  x  6 

20 

17 

78.43 

96.08 

1.15 

6/2  x  4 

20 

100 

89 

100 

1.01 

4/2  x  4 

20 

4.37 

96.95 

98.47 

1.02 

8/2  x  8 

22 

15.67 

72.34 

95.74 

1.29 

8/2  x  6 

22 

100 

61 

100 

1.08 

8/3  x  4 

22 

6.93 

95.19 

99.04 

1.02 

6/2  x  6 

22 

10 

80 

96.67 

1.37 

6/2  x  4 

22 

100 

91 

100 

1.01 

4/2  x  4 

22 

1.83 

98.18 

98.18 

1.00 

8/2  x  8 

24 

6.33 

78.95 

94.74 

1.39 

8/2  x  6 

24 

100 

64 

100 

1.07 

8/3  x  4 

24 

2.76 

96.39 

98.80 

1.02 

6/2  x  6 

24 

4.37 

90.84 

96.95 

1.12 

6/2  x  4 

24 

100 

91 

98.33 

1.01 

8/2  x  8 

26 

2 

83.33 

83.33 

1.00 

8/2  x  6 

26 

99 

65.66 

99.63 

1.07 

8/3  x  4 

26 

1.37 

95.12 

100 

1.01 

6/2  x  6 

26 

1.9 

96.49 

100 

1.03 

6/2  x  4 

26 

100 

91.33 

99 

1.01 

8/2  x  6 

28 

100 

65.67 

98.67 

1.07 

6/2  x  4 

28 

98.33 

91.28 

99.33 

1.01 

8/2  x  6 

30 

98.67 

66.55 

99.32 

1.07 
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ABSTRACT 

Given  a  set  of  pairwise  distance  estimates  between  nodes,  it  is  of¬ 
ten  of  interest  to  generate  a  map  of  node  locations.  This  is  an  old 
problem  that  has  attracted  renewed  interest  in  the  signal  process¬ 
ing  community,  due  to  the  recent  emergence  of  wireless  sensor 
networks  and  ad-hoc  networks.  Sensor  maps  are  useful  for  esti¬ 
mating  the  spatial  distribution  of  measured  phenomena,  as  well  as 
for  routing  purposes.  Both  centralized  and  decentralized  solutions 
have  been  developed,  along  with  ways  to  cope  with  missing  data, 
accounting  for  the  reliability  of  individual  measurements,  etc.  We 
revisit  the  basic  version  of  the  problem,  and  propose  a  two-stage 
algorithm  that  combines  algebraic  initialization  and  gradient  de¬ 
scent.  In  particular,  we  borrow  an  algebraic  solution  from  the  data¬ 
base  literature  and  adapt  it  to  the  sensor  network  context,  using 
a  specific  choice  of  anchor/pivot  nodes.  The  resulting  estimates 
are  fed  to  a  gradient  descent  iteration.  The  overall  algorithm  of¬ 
fers  better  performance  at  lower  complexity  than  existing  central¬ 
ized  full-connectivity  solutions.  Also,  its  performance  is  relatively 
close  to  the  corresponding  Cramer-Rao  bound,  especially  for  small 
values  of  range  error  variance. 

1.  INTRODUCTION 

The  problem  of  node  localization  from  pairwise  distance  estimates 
has  recently  attracted  renewed  interest  in  the  signal  processing  and 
communications  literature  (e.g.,  [1,  3,  4]),  owing  to  the  recent  in¬ 
terest  in  wireless  sensor  networks  and  ad-hoc  networks.  Given  a 
matrix  of  pairwise  distances  (usually  estimated  using  received  sig¬ 
nal  strength  measurements  and  a  path  loss  model),  the  localization 
problem  asks  to  determine  the  relative  node  locations  that  gener¬ 
ate  these  distances.  In  other  words,  one  seeks  a  map  of  sensor 
locations  with  a  given  (approximate)  distance  structure.  This  is 
a  classic  problem  originating  in  psychometrics  [5,  6],  known  as 
Multi-Dimensional  Scaling  (MDS). 

There  are  many  MDS  flavors  and  variants;  perhaps  the  single 
most  important  version  is  metric  MDS.  The  classic  approach  to 
solving  MDS  is  based  on  computing  the  principal  components  of 
a  double-centered  version  of  the  distance  matrix.  This  works  well 
(albeit  not  optimally,  due  to  the  double  centering),  but  its  complex¬ 
ity  is  cubic  in  the  number  of  nodes,  and  thus  does  not  scale  well 
with  network  size.  A  popular  alternative  to  principal  component 
analysis  (PCA)  is  the  use  of  gradient  descent  or  other  numerical 
optimization  tools  that  aim  to  optimize  a  stress  function.  The  stress 

1  Contact  Author.  E-mail:  nikos@telecom.tuc.gr,  Fax:  +30-28210- 
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function  measures  the  error  between  the  given  distances  and  those 
reproduced  by  a  given  configuration  of  points.  The  drawback  of 
gradient  descent  and  related  approaches  is  that  they  require  accu¬ 
rate  initialization. 

We  propose  a  two-stage  MDS  algorithm  that  employs  an  al¬ 
gebraic  initialization  procedure  followed  by  gradient  descent.  The 
algebraic  initialization  is  based  on  the  Fastmap  [2]  algorithm,  bor¬ 
rowed  from  the  database  literature.  Fastmap  is  a  linear-complexity 
mapping  tool,  which  is,  however,  sensitive  to  range  measurement 
errors.  Due  to  the  fact  that  distances  are  invariant  to  coordinate 
frame  transformations  (rotation,  reflection,  shift),  there  is  a  need 
to  employ  three  so-called  anchor  nodes,  whose  position  is  ac¬ 
curately  known  (e.g.,  via  GPS)  in  order  to  fix  a  desired  coordi¬ 
nate  frame.  Unfortunately,  Fastmap  is  very  sensitive  to  coordi¬ 
nate  alignment,  because  the  estimated  position  of  every  node  (and 
thus  anchor  nodes  as  well)  is  only  based  on  distances  to  selected 
pivot  nodes  -  thus  there  is  no  averaging.  In  order  to  mitigate  this 
problem,  we  advocate  a  particular  choice  of  anchor/pivot  nodes, 
placed  at  the  outer  edges  of  the  network.  This  placement  bypasses 
the  need  for  alignment  and  thus  alignment  errors,  thereby  pro¬ 
viding  a  high-quality  initialization  to  the  gradient  descent.  The 
overall  algorithm  affords  better  localization  accuracy  than  PCA- 
based  MDS,  at  substantially  lower  complexity  cost  (quadratic  in 
the  number  of  nodes). 

The  rest  of  this  paper  is  structured  as  follows.  In  Section  2  we 
explain  in  detail  the  PCA-based  MDS  algorithm,  and  its  alterna¬ 
tive  implementations.  The  Fastmap  algorithm  is  briefly  reviewed 
in  Section  3.  In  Section  4  we  describe  the  proposed  Fastmap-MDS 
algorithm.  Simulation  results  regarding  the  performance  of  the 
above  three  algorithms,  and  the  Cramer-Rao  Lower  Bound  for  the 
particular  localization  problem,  are  shown  in  Section  5  and  con¬ 
clusions  are  drawn  in  section  6. 

2.  MULTIDIMENSIONAL  SCALING 

Multidimensional  Scaling  (MDS)  [5,  6], [4]  is  a  method  used  to 
depict  the  spatial  structure  of  distance-like  data  using  the  dissimi¬ 
larity  measure  among  them.  It  has  its  origins  in  psychometrics  and 
psychophysics.  MDS  starts  by  presuming  that  the  dissimilarities 
of  each  pair  of  objects  stem  from  data  points  in  an  m-dimensional 
space.  In  most  cases  the  space  in  which  the  data  is  placed  is  2  or 
3-dimensional.  The  algorithm  aims  to  find  a  geometric  represen¬ 
tation  of  the  data,  such  that  the  distances  between  data  points  fit  as 
well  as  possible  to  the  given  dissimilarity  information. 

We  denote  the  dissimilarity  measure  (the  estimated  distances 
in  our  case),  between  objects  i  and  j  as  dij.  The  set  of  the  dissim- 
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ilarities  forms  the  matrix  D.  We  also  let  dij  denote  the  Euclid¬ 
ean  distance  between  two  points  X \  =  (xn,  Xi2,  ■■■,  Xim)  and 
Xj  —  (x j  1 ,  Xj2 ,  ■  •  ■  i  Xjrn),  i*e. 

(1) 

where  m  is  usually  2  or  3. 

In  classical  metric  MDS,  we  estimate  the  node  coordinates  X 
by  computing  the  m  principal  components  of  a  double-centered 
and  element-wise  squared  version  of  the  matrix  D,  denoted  by  B: 

B  =  (2) 

where  P  is  the  matrix  of  squared  distance  measures,  and  J  is  the 
centering  operator,  ie 

J  =  I  —  eeT  /N,  (3) 

with  N  denoting  the  number  of  objects  (sensor  nodes).  For  an 
N  x  N  matrix  D  and  for  m  dimensions,  it  can  be  shown  that 


JV  JV  N  N  m 

•  v4  \  Y 4  \  Y dij+mY Y 4)  =  Y 


thus  the  estimated  node  coordinates  are  given  by  the  m  princi¬ 
pal  eigenvectors  of  the  matrix  B.  scaled  by  the  square  roots  of 
the  corresponding  eigenvalues.  With  Ur  denoting  the  m  princi¬ 
pal  eigenvectors  and  Vr  diagonal  containing  the  corresponding 
eigenvalues,  Br  =  UrVrUr  is  an  optimal  least  squares  approxi- 
mation  of  B,  and  Xr  =  UrW  is  an  approximation  of  the  node 
coordinates  in  m-dimensional  space,  up  to  a  common  coordinate 
rotation,  reflection,  and  shift.  An  alignment  procedure  is  neces¬ 
sary  to  transform  the  estimated  node  locations  to  a  desired  frame 
of  reference. 

Direct  minimization  of  a  suitable  stress  function  is  an  alterna¬ 
tive  to  PCA-based  MDS  [5].  A  common  stress  function  is 


stress 2  =  —  dij)2.  (5) 


Minimization  starts  with  an  initial  guess  of  the  node  positions  (of¬ 
ten  random),  followed  by  gradient  descent  iterations.  Initialization 
matters  a  lot  in  this  context,  because  the  stress  function  is  multi¬ 
modal.  Furthermore,  the  number  of  iterations  required  for  conver¬ 
gence  depends  heavily  on  the  quality  of  the  initialization. 


3.  FASTMAP 


The  basic  element  of  Fastmap  [2]  is  the  projection  of  the  objects  on 
a  properly  selected  line.  This  is  achieved  by  selecting  two  objects 
Oa,  Ob,  called  pivots,  and  projecting  all  other  objects  on  the  line 
that  passes  through  them.  A  pair  of  pivots  is  chosen  for  each  of 
the  m  dimensions.  The  coordinates,  (i.e.  projections  on  the  pivot 
line)  of  the  objects  can  be  found  by  employing  the  cosine  law  [2], 
Thus,  the  first  coordinate  for  object  Oi  is  given  by: 


where  dij  is  the  dissimilarity  measure  between  nodes  i  and  j  and 
a,  b  are  the  pivot  objects.  After  computing  these  coordinates  for 


each  object  Oi,  we  consider  a  hyperplane  which  is  orthogonal  to 
the  pivot  line.  We  then  project  the  objects  on  this  hyperplane,  and 
repeat  the  process,  this  time  using 

dij  =  dij  -  (xi  ~  Xj)2 ,  i,j  =  l,...,N.  (7) 

A  heuristic  method  is  proposed  in  [2]  for  choosing  the  pivots  as  far 
as  possible  from  one  another. 

In  database  applications  there  is  no  “natural”  or  preferred  co¬ 
ordinate  frame  of  reference,  thus  the  final  alignment  step  is  not 
used,  and  anchors  are  not  needed.  In  the  context  of  sensor  net¬ 
works,  however,  obtaining  absolute  position  estimates  is  impor¬ 
tant.  Unfortunately,  Fastmap  is  very  sensitive  to  coordinate  align¬ 
ment,  because  the  estimated  position  of  every  node  (and  thus  an¬ 
chor  nodes  as  well)  is  only  based  on  distances  to  the  chosen  pivot 
nodes  -  thus  there  is  no  averaging.  In  order  to  mitigate  this  prob¬ 
lem,  we  advocate  a  particular  choice  of  anchor/pivot  nodes,  placed 
at  the  outer  edges  of  the  network.  In  particular,  we  assume  that  the 
sensor  nodes  are  spread  over  a  square,  and  place  the  anchor  nodes, 
which  will  also  serve  as  pivots,  at  three  vertices  (see  Fig.  1).  This 
placement  bypasses  the  need  for  alignment  and  thus  alignment  er¬ 
rors,  thereby  providing  a  high-quality  initialization  to  the  gradient 
descent.  Anchors  #1  and  #2  also  serve  as  pivots  for  determining 
the  coordinates  in  the  first  dimension,  while  anchors  #2  and  #3 
double  as  pivots  for  the  second  dimension. 


4.  TWO-STAGE  FASTMAP-MDS  APPROACH 

Fastmap  is  a  fast  algebraic  method  that  is  rather  sensitive  to  mea¬ 
surement  errors,  particularly  so  in  the  final  alignment  step.  In 
our  context,  this  sensitivity  can  be  mitigated  by  proper  use  of  an¬ 
chor/pivot  nodes.  The  resulting  estimates  can  be  used  as  initial¬ 
ization  for  gradient  descent.  Each  step  of  gradient  descent  costs 
0(N2).  Assuming  good-enough  initialization,  only  a  few  gradi¬ 
ent  descent  steps  will  be  needed.  This  suggests  that  a  substantial 
complexity  reduction  relative  to  PCA  is  possible.  Interestingly, 
estimation  accuracy  can  be  improved  as  well,  as  we  will  see. 

The  basic  steps  of  the  two-stage  algorithm  are  shown  in  Table 
1 .  Denoting  by  ( Xi ,  yf)  the  estimated  position  of  node  i,  the  partial 

#3 


#2 


Fig.  1.  Anchor-Pivot  node  placement  for  using  Fastmap  in  sensor 
network  localization 
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Table  1.  The  2-D  Hybrid  Fastmap-MDS  Algorithm 


1.  Run  Fastmap  using  as  pivot  the  anchor  nodes,  which  are 
placed  on  the  three  vertices  of  the  square  distribution  area. 
Let  A'  be  the  vector  which  contains  all  the  estimated  coor¬ 
dinates,  which  are  returned  by  Fastmap. 

2.  Determine  p,  A 

3.  Fori  =  1  top 
begin 

•  evaluate  V  str ess  at  the  point  A' 

•  X  =  X  —  XV  stress 

end 

4.  Output:  A 


derivative  of  the  stress  function  in  (5)  is  given  by 

dstress  =  ^  (y/jm  -  Xj)2  +  fa  -  VjY  -  djj){xi  -  Xj) 
dxi  h'i  VO K  ~  xi)2  +  (V*  ~  Vi)2 


with  a  similar  expression  for  the  partial  derivative  with  respect  to 
yi.  For  simplicity,  but  also  to  bound  complexity,  a  fixed  number 
p  =  10  of  gradient  descent  steps  is  used  in  our  simulations. 


5.  RESULTS 

We  compare  the  three  algorithms  described  above,  in  the  context  of 
node  localization  in  sensor  networks.  We  consider  that  the  network 
has  full  connectivity,  that  is,  we  have  distance  estimates  for  every 
pair  of  nodes.  The  distance  estimates  are  assumed  to  contain  an 
error  which  is  proportional  to  the  true  distance  between  the  nodes. 
Thus,  we  model  the  distance  estimates  to  be 

dij  =  pij  +  pijAf(0,  er),  (9) 

where  pij  is  the  true  distance  between  nodes  i  and  j  and  er  is  the 
measurement  range  error  variance.  Network  nodes  are  considered 
to  be  uniformly  distributed  in  a  square  with  area  equal  to  1,  i.e. 
the  x  and  y  coordinates  of  the  sensor  nodes  are  assumed  uniformly 
distributed  in  [0, 1],  We  employ  the  alignment  procedure  described 
in  [3],  in  order  to  find  the  actual  coordinates,  and  adopt  root  mean 
squared  error  as  our  estimation  performance  metric 


Table  2.  Computational  complexities 


Algorithm 

Complexity 

Fastmap 

Hybrid  Fastmap-SVD 
MDS  with  SVD 

0(mN ) 

0(pmN2),  p  «  N 
0(N3) 

0.01  for  this  experiment.  We  observe  that  Fastmap  exhibits  poor 
performance,  while  PCA-based  MDS  and  the  proposed  two-stage 
algorithms  have  better  performance,  as  expected.  Interestingly,  the 
proposed  algorithm  is  not  only  less  complex,  but  also  more  accu¬ 
rate  than  PCA-MDS.  This  is  partially  attributed  to  the  fact  that 
PCA-MDS  uses  double  centering,  which  colors  the  noise,  whereas 
the  proposed  algorithm  directly  minimizes  the  stress  function.  We 
also  observe  that  the  Hybrid  algorithm  is  relatively  close  to  the 
CRB,  especially  for  low  range  error  variance. 

In  Fig.  3  we  show  corresponding  performance  results  and  the 
CRB  for  a  network  with  200  nodes.  The  A  parameter  is  set  to 
0.005.  The  estimation  accuracies  of  both  PCA-MDS  and  the  pro¬ 
posed  two-stage  algorithm  improve,  as  expected,  relative  to  the 
previous  case.  Fastmap  does  not  benefit,  due  to  the  lack  of  (im¬ 
plicit  or  explicit)  averaging. 

We  now  compare  the  three  algorithms  over  an  additive  white 
noise  measurement  model,  i.e.,  the  measurements  have  the  follow¬ 
ing  form 

dij  =Pij  +  Nf(0,er),  (11) 

where  the  variance  of  the  measurement  error  is  independent  of  the 
distance  between  the  two  nodes.  The  results  are  shown  in  Fig.  4 
for  the  case  of  80  sensor  nodes,  and  in  Fig.  5  for  the  case  of  200 
nodes.  We  observe  again  that  the  Hybrid  algorithm  exhibits  better 
performance  than  the  other  two. 

6.  CONCLUSIONS 

We  have  proposed  a  two-stage  hybrid  localization  algorithm  that 
offers  a  better  accuracy-complexity  trade-off  than  existing  alterna¬ 
tives  in  the  context  of  sensor  networks.  The  new  algorithm  em¬ 
ploys  Fastmap,  coupled  with  judicious  selection  of  anchor  nodes 
that  double  as  pivots,  to  generate  a  computationally  cheap  yet  suf¬ 
ficiently  accurate  initialization  for  gradient  descent.  Our  simula¬ 
tions  indicate  that  the  overall  algorithm  outperforms  PCA-based 
MDS  both  in  terms  of  complexity  and  in  terms  of  estimation  accu¬ 
racy.  Future  work  will  include  pertinent  modifications  of  this  idea 
that  are  well-suited  for  distributed  computation  and  missing  data. 


where  xei,  yei  are  the  estimated  coordinates,  and  xri,  yri  are  the 
actual  sensors  coordinates.  The  baseline  MDS  algorithm  is  based 
on  PCA.  The  complexities  of  the  three  algorithms  are  summarized 
in  Table  2. 

In  Fig.  2  we  show  the  RMSE  performance  of  the  three  meth¬ 
ods  for  a  sensor  network  with  80  sensors,  as  a  function  of  er. 
The  corresponding  Cramer-Rao  Bound  (CRB)  is  also  plotted  as 
a  benchmark1 .  The  parameter  A  of  the  hybrid  algorithm  is  set  to 

1  CRB  derivations  are  omitted  due  to  space  considerations,  but  will  be 
included  in  the  journal  version. 
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Fig.  4.  RMSE  performance  vs  measurement  range  error  variance. 
N=80,  additive  noise  measurement  model,  all  pairwise  distance 
estimates  collected.  100  Monte  Carlo  runs. 


Fig.  2.  RMSE  performance  vs  measurement  range  error  variance. 
N=80,  all  pairwise  distance  estimates  collected.  Measurement  er¬ 
ror  proportional  to  the  actual  distance.  100  Monte  Carlo  runs. 


-9 - SVD-MDS 

+  Fastmap  with  fixed  pivot 

-b - Hybrid  Fastmap-MDS 

- CRLB 


Fig.  3.  RMSE  performance  vs  measurement  range  error  vari¬ 
ance.  N=200  sensor  nodes,  all  pairwise  distance  estimates  col¬ 
lected.  Measurement  error  proportional  to  the  actual  distance.  100 
Monte  Carlo  runs. 


Fig.  5.  RMSE  performance  vs  measurement  range  error  variance. 
N=200  sensor  nodes,  all  pairwise  distance  estimates  collected.  Ad¬ 
ditive  noise  measurement  model.  100  Monte  Carlo  runs. 
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ABSTRACT 

We  consider  the  problem  of  transmit  downlink  beamforming  for 
wireless  transmission  in  the  context  of  certain  broadcasting  or  mul¬ 
ticasting  applications  wherein  Channel  State  Information  (CSI)  is 
available  at  the  transmitter,  and  a  common  message  is  to  be  trans¬ 
mitted  to  the  users.  Unlike  the  usual  “blind”  isotropic  broadcast¬ 
ing  scenario,  the  availability  of  CSI  allows  transmit  optimization. 
We  adopt  a  minimum  transmission  power  criterion,  subject  to  pre¬ 
scribed  minimum  received  Signal-to-Noise  Ratio  (SNR)  at  each  of 
the  intended  receivers.  We  also  consider  a  related  max-rnin  SNR 
“fair”  problem  formulation  subject  to  a  transmit  power  constraint. 
The  basic  problem  is  non-convex  and  thus  difficult  to  solve;  how¬ 
ever,  we  show  that  a  suitable  reformulation  allows  the  application 
of  semidefinite  relaxation  (SDR)  techniques.  SDR  yields  a  (gen¬ 
erally  approximate)  solution,  but  in  many  cases  our  solution  is  op¬ 
timal,  and  in  most  cases  it  is  within  3-4  dB  from  the  optimal  so¬ 
lution,  which  is  often  good  enough  in  our  intended  applications. 
While  the  focus  of  the  paper  is  on  a  wireless  communication  sce¬ 
nario,  we  also  discuss  related  problems  in  downstream  precoding 
for  broadcasting  in  digital  subscriber  line  systems. 

1.  INTRODUCTION 

Consider  a  transmitter  that  utilizes  an  antenna  array  to  broadcast 
(common)  information  to  multiple  radio  receivers  (with  a  sin¬ 
gle  antenna)  within  a  certain  service  area.  The  traditional  ap¬ 
proach  to  broadcasting  is  to  radiate  transmission  power  isotropi¬ 
cally,  or  with  a  fixed  directional  pattern.  While  such  an  approach 
has  the  advantage  that  it  is  channel  independent,  it  may  incur  a 
substantial  performance  penally.  Furthermore,  in  modem  digital 
video/audio/data  broadcasting  and  multicasting  applications,  it  is 
often  plausible  to  assume  that  the  transmitter  can  acquire  chan¬ 
nel  stale  information  (CSI)  for  all  its  intended  receivers.  This 
is  relatively  straightforward  in  fixed  wireless  systems  and  Time- 
Division-Duplex  (TDD)  systems,  but  it  can  also  be  accomplished 
in  more  general  scenarios  through  the  use  of  beacon  signals,  pe¬ 
riodically  transmitted  from  the  broadcasting  station  (and  typically 
embedded  in  the  transmission).  The  receiving  radios  can  then  feed 
back  their  CSI  through  a  feedback  channel.  For  the  moment,  we 
shall  assume  that  all  channels  are  perfectly  known  at  the  trans¬ 
mitter  site.  Most  of  these  assumptions  can  be  alleviated,  up  to  a 
certain  extent,  at  the  expense  of  graceful  performance  degradation 
relative  to  the  idealized  conditions  postulated  above. 
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The  key  idea  is  this;  If  the  transmitter  has  CSI  for  all  the  ra¬ 
dios  that  it  intends  to  broadcast  to,  then  it  makes  sense  to  attempt  to 
minimize  total  transmission  power  (and  thus  leakage  to  neighbor¬ 
ing  co-channel  transmissions),  subject  to  meeting  constraints  on 
the  received  Signai-to-Noise  Ratio  (SNR)  for  each  individual  in¬ 
tended  receiver.  Note  that  this  is  a  Quality  of  Service  (QoS)  guar¬ 
antee  that  directly  translates  to  a  guaranteed  minimum  information 
rate  for  each  of  the  receivers.  Also  note  that  different  receivers  may 
have  different  SNR  requirements,  due  to  differing  traffic  require¬ 
ments,  and  different  noise  and  interference  conditions. 

Another  application  of  the  methodology  developed  herein  can 
be  found  in  downstream  multicast  transmission  for  multi-carrier 
and  single-carrier  Digital  Subscriber  Line  (DSL)  systems.  In  this 
context,  (linear)  precoding  of  multiple  DSL  loops  in  the  same 
binder  that  wish  to  subscribe  to  a  common  service  (e.g.,  news  feed, 
video-conference,  or  movie  multicast)  can  be  employed  to  improve 
quality  of  service  and/or  reduce  far-end  crosstalk  (FEXT)  interfer¬ 
ence  to  other  loops  in  the  binder.  In  cases  wherein  the  Customer- 
Premise  Equipment  (CPE)  receivers  are  not  physically  co-located 
(as  in  residential  service),  or  cannot  be  coordinated  (as  in  legacy 
CPE  systems),  multiuser  decoding  of  the  downstream  transmission 
is  not  feasible,  while  transmit  precoding  is  viable.  The  most  impor¬ 
tant  difference  between  DSL  and  the  wireless  multicast  scenario 
considered  so  far  is  that  DSL  channels  are  diagonally-dominant. 
That  said,  exploitation  of  the  crosstalk  coupling  to  reduce  FEXT 
levels  to  other  loops  in  the  binder  offers  the  potential  for  consider¬ 
able  gains  in  the  management  of  mutual  interference. 

It  is  interesting  to  note  that,  as  of  today,  internet  multicast¬ 
ing  (using  the  internet  protocol’s  Multicast  Backbone  -  MBone)  is 
performed  at  the  network  layer,  i.e.,  via  packet-level  flooding  or 
spanning-tree  access  of  the  participant  nodes  and  any  intermediate 
nodes  needed  to  access  the  participants.  Instead,  what  we  advocate 
herein  is  judicious  physical  layer  multicasting,  that  is  enabled  by 
i)  the  availability  of  multiple  transmitting  elements:  ii)  exploiting 
opportunities  for  joint  beamforming/precoding;  and  iii)  the  avail¬ 
ability  of  CSI  at  the  transmitting  node  or  one  of  its  proxies.  This  is 
a  cross-layer  optimization  approach  that  exploits  information  that 
is  made  available  at  the  physical  layer  to  reduce  relay  retransmis¬ 
sions  at  the  network  layer.  This  provides  the  potential  for  conges¬ 
tion  relief  and  considerable  Quality  of  .Service  (QoS)  gains. 

Notation:  We  use  lowercase  boldface  letters  to  denote  column 
vectors,  and  uppercase  bold  letters  to  denote  matrices.  (  )T  de¬ 
notes  transpose,  while  (-)H  denotes  Hermitian  (conjugate)  trans¬ 
pose,  Re  (Itn)  extracts  the  real  (respectively,  imaginary)  part  of 
its  argument. 


0-7803-8545-4/04/$20.00  ©2004  IEEE 


489 


2.  DATA  MODEL  AND  PROBLEM  STATEMENT 

We  assume  that  each  radio  receiver  employs  a  single  receive  an¬ 
tenna  (and  thus  a  single  receiver  front-end  and  downconversion 
chain),  as  is  appropriate  for  simplicity  and  cost  considerations  in 
broadcasting  applications.  Let  hi  denote  the  N  x  1  complex  vec¬ 
tor  modeling  propagation  loss  and  phase  shift  from  each  of  the 
N  transmitting  antenna  elements  to  the  receiving  antenna  of  user 
i  e  {1,  •  -  -  ,  A/}.  This  model  assumes  that  the  channels  between 
the  transmitter  and  the  receivers  are  flat  in  frequency  over  the  band¬ 
width  of  the  transmitted  signal,  but.  as  we  will  demonstrate  be¬ 
low.  the  principles  of  our  design  can  be  extended  to  the  frequency- 
selective  case  in  a  straightforward  manner. 

If  we  let  wH  denote  the  weight  vector  applied  to  the  N 
transmitting  antenna  elements,  then  the  problem  of  interest  is 
to  minimize  the  transmitted  power  (of  a  white  data  sequence), 
subject  to  the  received  signal  power  of  user  i  being  larger  than  a 
threshold  c,.  This  problem  can  be  written  as 


xnin  t|w||o 

subject  to:  |wHhj|2  >  Cj,  »€{!,•■•,  A/} 


where  w  e  CN.  This  is  a  quadratically  constrained  quadratic 
programming  problem,  but  unfortunately  the  constraints  are  not 
convex. 

2.1.  Review  of  Pertinent  Prior  Art 

The  above  problem  is  reminiscent  of  some  closely-related  prob¬ 
lems.  For  M  —  1,  the  optimum  w  is  a  matched  filter.  When  the 
channel  vectors  span  a  ball  or  ellipsoid  about  a  “nominal"  channel 
vector  (a  model  that  implies  a  continuum  of  intended  receivers), 
the  problem  can  be  solved  exactly  using  second-order  cone  pro¬ 
gramming.  as  shown  in  [8],  The  key  observation  is  that  one  can 
convert  the  infinitely-many  non-convex  constraints  over  the  ball 
into  a  single  convex  constraint,  by  taking  advantage  of  rotational 
freedom  and  the  Cauchy-Schwartz  inequality  to  explicitly  con¬ 
struct  the  worst-case  channel  vector  within  the  said  ball.  Unfortu¬ 
nately,  we  are  not  aware  of  a  corresponding  conversion  for  finitefy- 
many  channel  vectors  (intended  receivers). 

Another  closely-related  work  is  that  in  [1J  (and  references 
therein),  which  considers  the  problem  of  multiuser  transmit  beam- 
forming  for  the  cellular  downlink.  The  key  difference  between  [  1  ] 
and  our  formulation  is  that  the  authors  of  [1]  consider  the  trans¬ 
mission  of  independent  information  to  each  of  the  downlink  users, 
whereas  we  focus  on  the  broadcast  of  common  information.  The 
mathematical  formulations  of  these  problems  are  not  equivalent. 
A  simple  way  to  see  this  is  to  note  that  in  the  generic  case  of  our 
formulation  most  of  the  SNR  constraints  will  be  inactive  at  the 
optimum  (i.e.,  most  of  the  constraints  will  be  over-satisfied).  Con¬ 
sider,  e.g.,  the  case  of  two  closely-located  receivers  with  different 
SNR  requirements:  one  of  the  two  associated  constraints  will  be 
over-satisfied  at  the  optimum.  On  the  other  hand,  it  is  proven  in  [1] 
that,  in  the  cellular  downlink  problem,  the  constraints  are  always 
met  with  equality  at  the  optimum.  The  important  common  denom¬ 
inator  of  our  work  and  [1]  is  the  use  of  semidefinite  programming 
tools. 


3.  RELAXATION 

Towards  solving  our  problem,  we  first  recast  it  as  follows: 
min  trace(  ww  H  ) 

W 

subject  to  :  frace(wwWQi)  >  Cj,  »  €  {1,  •  •  •  ,  M}  , 

where  we  have  used  the  fact  that  hfww^hi  = 
fruce(hf  wwwh;)  =  frace(wwHhjh^),  and  Q ;  :=  hihf . 
Now  consider  the  following  reformulation  of  the  problem: 

min  tracelX) 

■  xeC‘VxJV 

subject  to  :  frace(XQi)  >  a,  i  6  {1,  •  ■  ■  ,  M }  , 

X  >  0, 

rank(X)  —  1, 

where  now  X  is  an  N  x  N  complex  matrix,  and  the  inequality 
X  >  0  means  that  the  matrix  X  is  symmetric  positive  semidefi¬ 
nite.  Note  that,  in  the  above  equivalent  formulation  of  our  prob¬ 
lem.  the  cost  function  is  linear  in  X:  the  trace  constraints  are  linear 
inequalities  in  X,  and  the  set  of  symmetric  positive  semidefinite 
matrices  is  convex;  however  the  rank  constraint  on  X  is  not  con¬ 
vex.  The  important  observation  is  that  the  above  problem  is  in 
a  form  suitable  for  semidefinite  relaxation  (SDR)  (e.g.,  see  [4]). 
That  is,  by  dropping  the  rank-one  constraint,  one  obtains  the  re¬ 
laxed  problem 

min  frnce(X) 
xeCN*N 

subjeetto  :  troce(XQj)  >  a,  i  6  {1,  ■  •  •  ,  M}  ,  andX  >  0, 

which  is  a  semidefinite  programming  problem  (SDP),  albeit  not 
yet  in  standard  form.  In  order  to  put  it  in  a  standard  form,  we  add 
M  non-negalive  “slack”  variables  one  for  each  trace  constraint. 
In  this  way,  we  obtain  the  following  formulation 


mmX6CNx/v  tiec(Ij v)Toec(X) 
subject  to:  vec(Qf)rvec(X)  —  s;  =  Ci,  i  g  {1,  ■  •  ■  ,  M} 
*>0,t€  {!,••■  ,M},  and  X  >  0 


which  is  now  expressed  in  a  standard  form  used  by  SDP  solvers, 
such  as  SeDuMi  [6]. 

SDP  problems  can  be  efficiently  solved  using  interior  point 
methods.  In  particular,  the  complexity  of  solving  the  above  pro¬ 
gram  is  at  most  0((M  4-  iV)6  5),  and  it  is  usually  much  less.  Se¬ 
DuMi  [6]  is  a  MATLAB  implementation  of  modem  interior  poinl 
methods  for  SDP  that  is  particularly  efficient  for  the  moderate¬ 
sized  problems  that  are  encountered  in  our  context.  Typical  run 
times  for  realistic  choices  of  N  and  M  are  about  1/10  sec,  on  a 
typical  desktop  computer. 

4.  ALGORITHM 

Due  to  the  relaxation,  the  matrix  Xapt  obtained  through  the  SDP 
will  not  be  rank-one  in  general.  If  il  is.  then  its  principal  compo¬ 
nent  will  be  the  optimal  solution  to  the  original  problem.  If  not, 
then  the  trace  of  Xopt  is  a  lower  bound  on  the  power  needed  to 
satisfy  the  constraints.  This  is  evident  from  the  fact  that  we  have 
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removed  one  of  the  original  problem’s  constraints.  Researchers 
in  optimization  have  recently  developed  ways  of  generating  good 
solutions  to  the  original  problem  from  the  solution  to  the  relaxed 
problem,  Xopf  [4,  9.  7,  5].  This  process  is  based  on  randomiza¬ 
tion:  using  Xop(  to  generate  a  set  of  candidate  weight  vectors, 
{we},  from  which  the  “best”  solution  will  be  selected.  We  con¬ 
sider  two  methods  for  generating  the  wr’s,  both  of  which  have 
been  designed  so  that  their  computational  cost  is  negligible  com¬ 
pared  to  that  of  computing  XoP (For  consistency,  the  princi¬ 
pal  component  is  also  included  in  the  set  of  candidates.)  In  the 
first  method  (randA),  we  calculate  the  eigen-decomposition  of 
Xppi  —  USUH  and  choose  wr  such  that  w<  =  U£1/2er, 
where  is  uniformly  distributed  on  the  unit  sphere.  In  the  sec¬ 
ond  method  (randB),  inspired  by  Tseng  [7],  we  choose  w/  such 
that  [wr]i  =  \/[Xop,t]ii where  the  9t,i  are  independent 
and  uniformly  distributed  on  [0,  2tt).  In  both  cases,  f| H-J  = 
tracc(Xop(),  and  hence  when  rank(X0j,f)  >  1,  at  least  one  of 
the  constraints  |w^h;|2  >  C;  will  be  violated.  However,  a  feasi¬ 
ble  weight  vector  can  be  found  by  simply  scaling  w<  so  that  all  the 
constraints  are  satisfied.  The  “best"  of  these  randomly  generated 
weight  vectors  is  the  one  that  requires  the  smallest  scaling.  The 
overall  approach  is  summarized  in  Table  1.  We  point  out  that  we 
have  not  yet  been  able  to  obtain  theoretical  a  priori  bounds  on  the 
extent  of  the  sub-optimality  of  solutions  generated  in  this  way,  but 
our  simulation  results  are  quite  encouraging. 

5.  MAX-MIN  FAIR  BEAMFORMING 

We  now  switch  to  an  alternative  problem  that  is  also  of  interest. 
We  consider 


maxweC'v  min{|w"hi|a}^1 
subject  to:  1 1 w||®  <  P 


It  is  easy  to  see  that  the  constraint  should  be  met  with  equality  at  an 
optimum,  for  otherwise  w  could  be  scaled  up,  thereby  improving 
the  objective  and  contradicting  optimality.  Thus  we  can  focus  on 
the  equality-constrained  problem.  With  a  scaling  oflhe  optimiza¬ 
tion  variable  w  =  v/fw,  the  equality-constrained  problem  can  be 
written  as 

max*  min  {P|wwhi|2}^j 
subject  to:  ||w|  I2  —  1. 

It  is  clear  that  the  solution  to  this  problem  simply  scales  with  P; 
the  solution  scales  up  with  vP,  while  the  optimum  value  scales 
up  with  P.  We  can  therefore  restrict  our  attention  to  the  problem 
(dropping  the  tilde  for  brevity): 

max*  min  {|'wHh;|2}l‘^1 
subject  to:  j  |  w  J  |  '2  =  1 

Some  discussion  is  due  at  this  point  on  the  relationship  be¬ 
tween  the  two  problem  formulations:  the  original  QoS  formula¬ 
tion  that  seeks  to  minimize  the  total  transmit  power  subject  to  pre¬ 
scribed  lower  bounds,  c*.  on  the  received  signal  powers:  and  the 
max-min  “fair"  formulation  seeks  to  maximize  the  received  sig¬ 
nal  power  of  the  weakest  user  subject  to  an  overall  transmit  power 


constraint.  Suppose  that  all  Ci’s  are  equal  to  c,  and  the  QoS  formu¬ 
lation  yields  a  beamformer  w,  and  associated  minimum  transmit 
power  Pq.  Then  we  can  scale  the  solution  of  the  max-min  fair 
beamformer  to  power  Pq,  and  this  scaled  max-min  fair  solution, 
denoted  w/,  will  be  an  optimal  solution  to 

maxw  min  {|wwh,|2} 

subject  to:  ||wj|2  =  Pq 

As  a  result,  since  w,  already  attains  |w^hi|2  >  c,  Vi.  it  follows 
that  Iw^hjl2  >  c,  Vi.  Hence  w j  also  satisfies  the  constraints  of 
the  QoS  formulation,  and  at  the  same  power  as  wq.  It  follows  that 
w/  is  equivalent  to  w9.  This  shows  that 

Claim  1  The  QoS  problem  formulation  and  the  max-min  fair 
problem  formulation  are  equivalent  in  the  case  that  all  the  Cj ’s 
are  equal. 

When  the  cj’s  are  different,  however,  the  two  problem  formula¬ 
tions  generally  yield  different  beamformers.  Claim  1  implies  an 
indimet  way  of  solving  the  max-min  fair  problem: 

Corollary  1  One  way  to  solve  the  max-min  fair  problem  is  to 
solve  the  QoS  problem  with  &  =  1,  Vi  €  {1,  ■  •  •  ,  M},  then  scale 
the  resulting  solution  to  the  desired  power  P. 

6.  THE  CASE  OF  FREQUENCY-SELECTIVE 
MULTIPATH 

Although  we  have  focused  our  attention  so  far  on  frequency- 
flat  fading  channels,  the  situation  is  quite  similar  for  frequency- 
selective  (intersymbol-interfcrencc)  channels.  Let  denote  the 
£-th  N  x  1  vector  tap  of  the  baseband-equivalent  discrete-time  im¬ 
pulse  response  of  the  multipath  channel  between  the  transmitter 
antenna  array  and  the  (single)  receive  antenna  of  rcceiver-i.  As¬ 
sume  that  delay  spread  is  limited  to  L  non-zero  vector  channel 
taps.  Define  the  channel  matrix  for  the  i-th  receiver  as 

Beamforming  the  transmit  array  with  a  fixed  (time-invariant) 
v>'H  yields  a  scalar  equivalent  channel  front  the  viewpoint  of  the 
i-th  receiver,  whose  scalar  taps  are  given  by 

or,  in  vector  form, 

hj  =  wwH;. 

Now,  if  a  Viterbi  equalizer  is  used  for  sequence  estimation  at  the 
receiver,  then  the  parameter  that  determines  performance  is  [3]: 

||hi|]2  =  w/,HjH/'w  = 

<roce(wwflH,H^ )  =  irace(wwwQj), 

where  Q,  :=  HiH^,  Therefore,  both  the  QoS  and  max-min 
“fair"  problems  naturally  extend  to  the  frequency-selective  case. 
While  Qi  is  generally  of  higher  rank  than  in  the  flat-fading  case, 
the  principles  of  relaxation  can  be  applied  in  an  analogous  manner 
to  generate  an  approximation  of  the  optimal  w. 
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7.  INSIGHTS  AFFORDED  VIA  DUALITY 

Let  us  return  to  our  original  problem: 

subject  to:  |wwh;  |2  >a,  i  €  {1,  ■  ■  ■  ,  M } 

We  can  convert  the  problem  to  real-valued  form;  this  yields  a 
2 N  x  1  vector  of  real  variables,  x  {w}T  Im  {w}rj  , 

and  the  Q;'s  are  now  2jV  x  2N  symmetric  matrices  of  rank  2: 
Q>  :=  gig r  +  I.g f. where  gi  :=  [Re  {hi}T  Im  {hj}Tj  ,  and 

g i  :=  {h;}T  —  Re  {hi}Tj  .  Then  our  original  problem 

can  be  written  as: 

minxTx 

subject  to:  x;  Q;x  >  a,  i  €  {1,  -  •  -  ,  M }  . 


It  can  be  shown  that  the  (Lagrange)  dual  of  problem  V  is  a 
Semi-Definite  Program  (SDP).  The  dual  problem  is  interesting,  be¬ 
cause  it  generates  a  lower  bound  on  the  minimum  objective  value 
of  the  original  problem  [2].  The  dual  problem  is  convex  by  virtue 
of  its  definition.  This  means  that  we  can  solve  the  dual  problem 
and  thus  obtain  the  tightest  bound  obtainable  via  duality.  This 
duality-derived  bound  can  be  compared  to  the  SDR  bound  we  used 
earlier.  Let  X>()  (/?(•))  denote  the  dual  (respectively,  minimum)  of 
a  certain  minimization  problem,  and  let  Tt{V)  denote  the  semidef- 
inite  relaxation  of  V,  obtained  by  dropping  the  associated  rank -one 
constraint.  It  can  be  shown  that 

Claim  2  T>(T>(V))  =  TZ(V);  and  /?( 7Z(V))  =  P{T>(V)).  That 
is,  semidefmite  relaxation  yields  the  duality  bound  for  V,  and  the 
corresponding  gap  is  equal  to  the  duality  gap. 

Claim  2  along  with  claim  1  directly  yields  the  following  corollary: 

Corollary  2  Let  T  denote  the  max-mill  fair  problem  formulation. 
Then  V(V(lF))  =  H(T);  and  P(R{F))  =  P(V{F)).  Thus, 
semidefmite  relaxation  yields  the  duality  bound  for  T .  and  the  cor¬ 
responding  gap  is  equal  to  the  duality  gap. 

8.  SIMULATION  RESULTS 

Simulation  results  are  presented  in  Fig.  1  and  Tables  2,  3.  and  4. 

Table  2  summarizes  the  results  obtained  using  the  algorithm  in 
Table  1  with  the  randA  option  for  randomization.  Table  3  summa¬ 
rizes  the  results  obtained  using  the  algorithm  in  Table  I  and  both 
randA  and  randB  randomizations.  In  this  case,  the  best  of  the  two 
solutions  (in  the  sense  of  minimizing  the  power  boost  relative  to 
the  lower  bound  provided  by  SDR)  is  selected  in  each  Monte-Carlo 
(MC)  run.  The  captions  are  otherwise  self-contained.  Note  that, 
in  many  cases,  our  solutions  are  within  3-4  dB  from  the  (generally 
conservative)  lower  bound  on  transmit  power  provided  by  SDR, 
and  thus  are  guaranteed  to  be  at  most  3-4  dB  away  from  optimal; 
this  is  often  good  enough  from  an  engineering  perspective.  In  sev¬ 
eral  cases  the  solutions  are  essentially  optimal.  This  is  illustrated 


in  Figure  1,  which  shows  the  optimized  transmit  beam  pattern  for 
a  particular  far-field  multicasting  scenario  using  a  Uniform  Lin¬ 
ear  antenna  Array  (ULA);  the  details  of  the  simulation  setup  are 
included  in  the  figure  captions  for  ease  of  reference. 

Table  4  summarizes  our  simulation  results  for  max-min  fair 
beamforming.  Table  4  presents  averages  for  the  upper  bound  on 
minimum  SNR  (the  optimum  attained  by  SDP  without  regard  .to 
the  rank-one  constraint),  the  SDR-attained  minimum  SNR  (after 
randomization),  and  the  minimum  SNR  for  the  case  of  no  beam¬ 
forming.  For  the  latter,  we  have  used  w  —  ^7 1  .v  x  i ,  which  fixes 
transmit  power  to  1 .  The  number  of  post-SDR  randomizations  was 
set  to  30JVA/.  (This  time  a  function  of  N ,  M.)  It  is  satisfying 
to  note  that  the  SDR  solution  attains  a  significant  fraction  of  the 
(possibly  unattainable)  upper  bound.  Furthermore,  SDR  provides 
substantial  gains  over  not  beamforming  at  all. 

We  observe  from  Tables  2-  4,  that  as  N  and/or  M  increase, 
the  quality  of  the  solution  generated  by  the  semidefinite  relaxation 
degrades  a  little.  The  reasons  for  this  degradation  are  under  inves¬ 
tigation,  but  possible  causes  include  implementation  issues,  such 
as  the  number  of  randomizations  and  the  nature  of  the  randomiza¬ 
tion  strategy,  and  more  fundamental  issues,  such  as  the  potential 
for  a  mild  degradation  of  the  approximation  quality  of  the  method 
as  the  problem  size  grows.  (In  a  related,  but  distinct,  problem  the 
quality  of  the  SDR  approximation  degrades  logarithmically  in  the 
problem  size  [5].) 

9.  CONCLUSIONS 

We  have  laken  a  new  look  ai  the  broadcasting/multicasting  prob¬ 
lem  when  channel  slate  information  is  available  at  the  transmitter. 
We  have  formulated  the  problem  of  minimizing  the  transmit  power 
under  mulliple  SNR  constraints,  and  we  have  shown  how  its  solu¬ 
tion  can  be  often  well-approximated  using  semidefinite  relaxation 
tools.  We  have  also  considered  a  max-min  fair  problem  formula¬ 
tion.  For  both  formulations,  semidefinite  relaxation  yields  a  bound 
on  the  degree  of  suboptimaliiy  that  is  actually  equal  to  the  opti¬ 
mum  Lagrange  dual  bound.  This  justifies,  to  a  certain  extent,  the 
approximation  introduced  by  relaxation.  Still,  it  would  be  nice 
to  analyze  the  duality  gap  for  the  problem  at  hand,  for  this  would 
yield  a  priori  bounds  on  the  degree  of  suboptimality  introduced  by 
relaxation,  as  opposed  to  the  a  posteriori  bound  that  we  now  have 
by  virtue  of  Claim  2.  For  the  time  being,  our  simulation  results 
indicate  that  the  degree  of  suboptimality  is  often  within  3-4  dB,  on 
average,  which  is  acceptable  in  our  intended  applications. 

There  are  many  interesting  refinements  and  extensions  to  this 
work.  These  include  potentially  better  randomization  strategies, 
robustness  issues,  and  extensions  to  multiple  co-channel  multicas¬ 
ting  groups.  These  are  the  subjects  of  on-going  work,  and  will  be 
reported  elsewhere. 
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Table  1.  Broadcast  Beamforming  via  SPR:  Algorithm 


•  Solve  the  relaxed  problem: 

A  suitable  MATLAB  interface  for  SeDuMi  is  as  follows: 
%  H  is  N  by  M,  holding  the  channel  vectors: 

%  constraints  is  M  by  I,  holding  the  Rx  power  constraints 
vecQs  =  []; 
for  i=l:M, 

Qi  =  H(:,i)*H(:,i)’; 
vecQs  =  [vecQs  vec(Qi.')]; 
end 

A=[-eye(M).  vecQs.']; 
b=constraints; 

c=[zeros(M,l);  vec{eyc(N))]; 

K.I=M;  K.s=N;  K.scomplex=I; 

[ %oPt,  t/0},(,info]=sedutni(A,b,c,K); 

Xop<  =mat(£0pt(M+]  :end)); 

•  Randomization: 

Use  randA.  or  randB,  as  described  in  Section  4. 

It  is  often  preferable  to  run  both  and  pick  the  best  result. 


Table  2.  MC  simulation  results:  mean  and  standard  deviation  of 
upper  bound  on  power  boost.  H  is  circularly  symmetric  complex 
i.i.d.  Gaussian  (Rayleigh)  of  variance  1.  randA  randomization 
only.  #  post-SDR  randomizations  =  300.  The  symbol  U  indicates 
that  Rx  power  constraints  are  uniformly  distributed  random  vari¬ 
ables  in  [0, 1],  and  redrawn  for  each  MC  run:  1  means  that  all  Rx 
power  constraints  are  fixed  to  1 .  #  MC-runs  =  300. 


N/M 

mean  (U) 

std  (U) 

mean  (1) 

std  (1) 

4/8 

1.14 

0.27 

1.30 

0.36 

4/16 

1.63 

0.55 

1.96 

0.62 

8/16 

2.1 1 

0.65 

2.54 

0.68 

mi 

3.20 

0.79 

3.77 

0.93 

Table  3.  MC  simulation  results:  mean  and  standard  deviation  of 
upper  bound  on  power  boost.  Here,  the  best  result  from  two  ran¬ 
domization  techniques  (randA.randB)is  chosen  for  each  MC  run. 
#  post-SDR  randomizations  =  1000.  #  MC-runs  =  1000.  The  re¬ 
maining  parameters  are  as  in  Table  2. 


N/M 

mean  (U) 

std  (U) 

mean  (1) 

std  (1) 

4/8 

1.07 

0.12 

1.15 

0.17 

4/16 

1.32 

0.26 

1.49 

0.30 

_ 8/16 

1.72 

0.34 

2.06 

0.34 

8/32 

2.51 

0.43 

2.96 

0.44 

Table  4.  MC  simulation  results  for  max-min  fair  beamform¬ 
ing:  averages  for  upper  bound  on  min;  SNR,,  relaxation-attained 
mini  SNR, ,  and  the  mim  SNRi  for  the  case  of  no  beamforming. 
The  results  are  averaged  over  1000  MC  runs.  For  each  MC  run,  H 
is  re-drawn  from  a  circularly  symmetric  complex  i.i.d.  Gaussian 
distribution  of  variance  1 .  The  best  result  from  two  randomization 
techniques  (randA.randB)  is  chosen  for  each  MC  run.  #  post-SDR 
randomizations  =  30-/VA-/.  P  =  1. 


N/M 

upper  bound 

SDR 

no  BMF 

4/8 

1.05 

0.92 

0.12 

4/16 

0.73 

0.48 

0.06 

8/16 

1.43 

0.72 

0.06 

8/32 

1.07 

0.37 

0.03 

N=B-elemem  Tx  ULA  (d/lamtxJa*  1 12 1 ;  DNLK  users;  consirainrs  =  ones(M.l).  Nranrfcaoo 

90  j  -  Sofi.  ixjv.Br  boost  ub  -  1  OqT) 


Scenario:  6  clusters  ol  4  users  each  @  [-51,-31.-11,1 1,31,51]  deg 


Fig.  1.  Broadcast  beamforming  example  using  Algorithm  in 
Table  I.  N=8-element  Tx  ULA  (d/A=!/2);  M=24  downlink 
users,  in  6  clusters  of  4  users  each.  Clusters  centered  at 
[—51,  —31,  —11, 11,  31, 51]°,  3:2°.  Symmetric  lobes  appear  due 
to  the  inherent  ULA  ambiguity.  All  Rx  power  constraints  set  to 
1.  randA,  #  post-SDR  randomizations  =  300.  In  this  case,  the 
solution  is  guaranteed  to  be  within  0.1%  of  the  optimum. 


493 


LOW-COMPLEXITY  DOWNLINK  BEAMFORMING  FOR  MAXIMUM  SUM  CAPACITY 


Goran  Dimic 

Dept,  of  ECE,  Univ.  of  Minnesota, 
Minneapolis  MN  55455,  U.S.A. 

E-mail:  goran@ece  .  umn .  edu 

ABSTRACT 

The  problem  of  simultaneous  multiuser  downlink  beam¬ 
forming  has  recently  attracted  significant  interest  in  both 
the  Information  Theory  and  Signal  Processing  communi¬ 
ties.  The  idea  is  to  employ  a  transmit  antenna  array  to  cre¬ 
ate  multiple  ‘beams’  directed  towards  the  individual  users, 
and  the  aim  is  to  increase  throughput,  measured  by  sum  ca¬ 
pacity.  Optimal  solutions  to  this  problem  require  convex 
optimization  and  so-called  Dirty  Paper  (DP)  precoding  for 
known  interference,  which  are  prohibitively  complex  for  ac¬ 
tual  online  implementation  at  the  base  station.  Motivated  by 
recent  results  by  Viswanathan  et  al  and  Caire  and  Shamai, 
we  propose  a  computationally  simple  user  selection  method 
coupled  with  zero-forcing  beamforming.  Our  results  indi¬ 
cate  that  the  proposed  method  attains  a  significant  fraction 
of  sum  capacity,  and  thus  offers  an  attractive  alternative  to 
DP-based  schemes. 

1.  INTRODUCTION 

Depending  on  whether  or  not  Channel  State  Information 
(CSI)  is  available  at  the  transmitter,  transmit  antenna  ar¬ 
rays  can  be  utilized  in  two  basic  ways  or  a  combination 
thereof:  space-time  coding,  and  spatial  multiplexing.  The 
former  can  be  used  without  CSI  at  the  transmitter,  and  al¬ 
lows  mitigation  and  exploitation  of  fading.  The  latter  re¬ 
quires  CSI  at  the  transmitter,  but  in  turn  allows  for  much 
higher  throughput.  Until  recently,  transmit  beamforming 
was  mostly  considered  for  voice  services  in  the  context  of 
the  cellular  downlink.  With  the  emergence  of  3G  and  4G 
systems,  higher  emphasis  is  being  placed  on  packet  data, 
which  are  more  delay-tolerant  but  require  much  higher 
throughput.  Hence  the  recent  interest  in  transmit  beamform¬ 
ing  strategies  for  the  cellular  downlink  that  aim  for  attaining 
the  sum  capacity  of  the  wireless  channel  [1,  8,  9,  4,  6,  7,  5]. 
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The  scenario  of  interest  can  be  modeled  as  a  non-de- 
graded  Gaussian  broadcast  channel  (GBC).  Let  N  be  the 
number  of  antennas  at  the  transmitter  (Base  Station  (BS) 
in  a  cellular  context),  and  consider  a  cluster  of  M  mobile 
users,  each  equipped  with  a  single  receive  antenna.  The 
channel  between  each  transmit  and  receive  antenna  is  con¬ 
stant  over  a  certain  time  interval  and  known  at  the  BS.  The 
received  signal  is  corrupted  by  AWGN  independent  across 
users.  The  BS  may  transmit  simultaneously,  using  multiple 
transmit  beams,  to  more  than  one  user  in  the  cluster. 

Since  the  receivers  cannot  cooperate,  successful  trans¬ 
mission  critically  depends  on  the  transmitter’s  ability  to  si¬ 
multaneously  send  independent  signals  with  as  small  in¬ 
terference  between  them  as  possible.  Caire  and  Shamai 
[1]  proposed  a  multiplexing  technique  based  on  coding  for 
known  interference,  known  as  “Writing  on  Dirty  Paper”  or 
Costa  precoding  [2].  In  [2],  it  is  proven  that  in  an  AWGN 
channel  with  additional  additive  Gaussian  interference, 
which  is  known  at  the  transmitter  in  advance  (non-causally), 
it  is  possible  to  achieve  the  same  capacity  as  if  there  were  no 
interference.  Assuming  Costa  precoding  and  known  chan¬ 
nels  at  the  transmitter,  Vishwanath  et  al.  [6]  and  Yu  and 
Cioffi  [9]  have  proposed  algorithms  that  evaluate  sum  ca¬ 
pacity  of  the  GBC  along  with  the  associated  optimal  sig¬ 
nal  covariance  matrix.  However,  both  approaches  require 
convex  optimization  in  (order  of)  MN  variables  to  find  the 
optimal  signal  covariance  matrix. 

The  complexity  of  the  proposed  optimization  algorithms 
makes  them  unsuitable  for  actual  implementation  at  the  BS. 
A  reduced-complexity  suboptimal  solution  to  sum  rate  max¬ 
imization  is  proposed  in  [1].  It  suggests  the  use  of  QR 
decomposition  of  the  channel  matrix  combined  with  dirty 
paper  (DP)  coding  at  the  transmitter.  The  combined  ap¬ 
proach  nulls  interference  between  data  streams,  and  hence, 
it  is  named  zero-forcing  dirty-paper  (ZF-DP)  precoding.  If 
N  >  M,  ZF-DP  is  proven  to  be  asymptotically  optimal  at 
both  low  and  high  SNR,  but  suboptimal  in  general;  whereas 
zero-forcing  (ZF)  beamforming  without  DP  coding  is  op¬ 
timal  in  the  low  SNR  regime  and  yields  the  same  slope 
of  throughput  versus  SNR  in  decibels  as  the  sum  capacity 
curve  at  high  SNR.  If  N  <  M,  [1]  has  shown  that  random 
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selection  of  U  <  N  users  incurs  throughput  loss  for  both 
ZF-DP  and  ZF.  Tu  and  Blum  [5]  have  proposed  a  selec¬ 
tion  algorithm  that  capitalizes  on  multiuser  diversity,  thus 
increasing  the  throughput  of  ZF-DP  precoding,  and  signif¬ 
icantly  narrowing  the  gap  between  ZF-DP  throughput  and 
capacity. 

An  important  shortcoming  of  DP  coding  is  that  it  re¬ 
quires  vector  coding  and  a  long  temporal  block  length  to 
be  well-approximated  in  practice;  furthermore,  with  current 
state-of-art,  such  approximation  entails  high  computational 
complexity  [3,  8,  10].  For  this  reason,  we  advocate  herein  a 
more  pragmatic  approach,  based  on  plain  ZF  beamforming 
coupled  with  a  new  user  selection  method.  Our  approach 
is  applicable  in  the  practically  important  case  that  the  num¬ 
ber  of  users  exceeds  the  number  of  transmit  antennas.  Our 
simulation  results  indicate  that,  at  moderate  and  high  SNR, 
the  proposed  approach  has  equal  slope  of  throughput  ver¬ 
sus  SNR  as  the  capacity  curve,  and  it  achieves  a  significant 
fraction  of  capacity  for  all  SNR. 

ZF  beamforming  without  DP  coding  was  also  consid¬ 
ered  by  Spencer  and  Haardt  [4],  but  they  did  not  consider 
user  selection  when  M  >  N.  Viswanathan  et  al.  [7]  have 
compared  the  performance  of  ZF  versus  ZF-DP,  using  a 
simpler  user  selection  scheme  that  schedules  the  N  users 
with  the  highest  individual  SINR.  Under  this  simpler 
scheme,  they  reported  that  ZF  is  close  to  ZF-DP  in  terms 
of  throughput.  Our  results  further  qualify  [7],  showing  that 
the  same  is  true  under  a  more  sophisticated  user  selection 
strategy  that  directly  aims  to  optimize  sum  capacity.  Fur¬ 
thermore,  we  show  that  with  this  new  user  selection  strategy 
ZF  comes  close  to  attaining  sum  capacity. 

2.  ZERO-FORCING  BEAMFORMING  AND  USER 
SELECTION  STRATEGY 

Let  model  the  quasi-static,  flat-fading  channel  be¬ 

tween  transmit  antenna  n  and  the  receive  antenna  of  user 
in,  and  denote  h,„  :=  \hm  \  ftm>2  . . .  /jTO)iv].  Similarly,  let 
wm  =  [wi,m  w-2,m  ■  ■  ■  WN,m]T  ((• )T  denotes  transpose)  be 
the  beamforming  weight  vector  for  user  m.  Thus  the  chan¬ 
nel  matrix,  H,  and  the  beamforming  weight  matrix,  W,  are 

H  =  [  hj  h2  •  •  •  ]  *  (1) 

W  =  [wi  w2  •  •  •  wm]  , 

where  (.)*  denotes  conjugate-transpose.  Collecting  the 
baseband-equivalent  outputs,  the  received  signal  vector  is 

x  =  HWDs  +  n  (2) 

where  s  is  the  transmitted  signal  vector  containing  uncorre¬ 
lated  unit-power  entries, 


s/pi  0  •••  0 

0  •  •  •  0 


(3) 


L  0  0  •  •  •  y/pi J  j 

accounts  for  power-loading  and  n  is  the  noise  vector.  Note 
that  the  elements  of  x  are  physically  distributed  across  the 
M  mobile  terminals.  Multiuser  decoding  is  therefore  not 
feasible,  hence  each  user  treats  the  signals  intended  for  other 
users  as  interference.  Noise  is  assumed  to  be  circular  com¬ 
plex  Gaussian,  zero-mean,  uncorrelated  with  variance  of 
each  complex  entry  a2  =  1. 

The  desired  signal  power  received  by  user  m  is  given  by 
h,„  wm  |  '2prn .  The  Signal  to  Interference  plus  Noise  Ratio 
(SINR)  of  user  m  is 


SINRm 


|hmwm|“pm 
J]  |hmw,|2p,:  +  a'2  ’ 


i^m 

The  problem  of  interest  can  now  be  formulated  as 


(4) 


M 

max  Y.  log(l  +  SINRm), 

w  m= 1  ^ 

subject  to:  ||WD|||,  <  P, 

where  | .  1 1  j..  denotes  Frobenius  norm  and  P  stands  for  a 
bound  on  average  transmitted  power. 

Attaining  capacity  requires  Gaussian  signaling  and  long 
codes,  yet  the  logarithmic  SINR  reward  can  be  motivated 
from  other,  more  practical  perspectives  as  well:  it  can  be 
shown  that  it  measures  the  throughput  of  QAM-modulated 
systems  over  both  AWGN  and  Rayleigh  fading  channels. 
The  intuition  is  that  SINR  improvements  eventually  yield 
diminishing  throughput  returns. 

ZF  beamforming  inverts  the  channel  matrix  at  the  trans¬ 
mitter,  so  that  orthogonal  channels  between  transmitter  and 
receivers  are  created.  It  is  then  possible  to  encode  users  in¬ 
dividually,  as  opposed  to  more  complex  long-block- vector 
coding  needed  to  implement  DR  Note  that  ZF  at  the  trans¬ 
mitter  does  not  enhance  noise  at  the  receiver.  If  the  number 
of  users,  M  <  N ,  and  rank( H)  =  M,  then  the  ZF  beam¬ 
forming  matrix  is 


W  =  H*(HH*)-1,  (6) 


which  is  the  Moore-Penrose  pseudoinverse  of  the  channel 
matrix.  However,  if  M  >  N  it  is  not  possible  to  use  (6) 
because  HH*  is  singular.  In  that  case,  one  needs  to  select 
n  <  N  out  of  M  users. 

For  M  >  N,  the  problem  is  reformulated  as  follows: 
Let  U  =  { 1,  2,  ...,  M},  and  Sn  =  {su  |  su  G  U}, 
such  that  |5n.]  =  n.  Given  H  G  CMxJV,  select  n  <  N, 
and  a  set  of  channels,  {hSl ,  ...  ,  hSn  },  which  produce  the 
row-reduced  channel  matrix 


H(s„)  =  [h;1  h:2  •••  h*Sn  y  (7) 
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such  that  the  sum  rate  is  the  highest  achievable: 


subject  to 

ies, 


max  max  R-  r(Sn 

1  <n<N  Sn 

1 

P 


ci  ( Sn 


=  P. 


(8) 


We  define, 

Rzf(Sn)  ■=  ^  [/ogo(/UC,;(5n)]+,  (9) 

i&Sn 

where  [a:]+  =  max{0,x}, 

Ci(Sn, )  =  {[(HjSnJHjSOT1]^}-1,  (10) 

and  /i  is  obtained  by  solving  the  water-filling  equation  in 
(8).  The  power-loading  then  yields 

1 


Pi  —  C,  ( S(i ) 


Ci  (S„ 


Vi  E  Sn 


(11) 


-I  + 


The  problem  can  be  conceptually  solved  by  exhaustive 
search:  for  each  value  of  n,  find  all  possible  n-tuples  Sn 
and  select  a  pair  ( n ,  Sn)  which  yields  maximum  R,f(Sn). 
However,  such  an  algorithm  has  prohibitive  complexity. 

We  propose  a  reduced-complexity  suboptimal  algo¬ 
rithm,  dubbed  Generalized  Zero  Forcing  (GZF),  as  outlined 
next. 


1.  Initialization: 


•  Set  n  =  1. 

•  Find  a  user,  s1;  such  that  Sj  =  argmaxh„h* . 

wEE/ 

•  Set  S i  =  {si}  and  denote  the  achieved  rate 
Rzf{Sl  )max  • 

2.  While  n  <  N: 

•  n  =  n  +  1. 

•  Find  a  user,  sn,  such  that 

sn  =  arg  max  Rzf(S„- 1  U  {«}). 

u€.U\Sn- 1 


•  Set  S„  =  Sn- 1  U  {s„}  and  denote  the  achieved 
rate  Rzf  (Sn ) max • 

•  If  Rzf(Sn)md4  S  RzfiSn—i'jmax  break  and 
retain  solution  (n  —  1,  Sn-i ). 

3.  Beamforming:  W  =  H(5„)*(H(5„)H(S„)*)-1 

Power  Loading:  Water-filling 

2.1.  Implementation  and  Complexity 

The  most  complex  task  is  the  evaluation  of  Rzf{Sn-\  U 
{m}  ) .  From  (9),  it  is  split  into  the  evaluation  of  the  c,;  (S„_i  U 
{w})’s  followed  by  evaluation  of  p.  An  efficient  way  to 
evaluate  the  c,;(SB_i  U  {w.})’s  is  by  using  the  matrix  inver¬ 
sion  lemma  to  invert  the  matrix  A(,S',,_  i  U  {w.})  :  = 

I1(.S'„  i  U  {n})H(S„_i  U  {n})*.  Note  that 


A(5„_!  U  {w.}) 


A(5„_! ) 

o* 


&U 

^11,11 


where  a„  =[hSlh*,  hS2h;,  ...  hs„_1h*]T  and  a,u,u  = 
h„h* .  Noting  that  A(S„_i  )*  =  A(5„_i ),  and  writing 

q  =  AjSn-!)-^,  (12) 


after  some  algebraic  manipulation  we  obtain 


A(5„_!  U  {w})  J  = 
T  {Q'u,u  truq) 


A(5„_!) 


-l 


0 

qq’ 

-q* 


n—  1 


-1 


-q 

l 


0„_i 

0 


(13) 


where  0^_j  =  [0  0  ...  0]ix(„_ij.  It  can  be  verified  that 
each  time  n  is  increased  A(5„_i)_1  and  i  E  Sn_-2, 
are  known  before  the  search  over  u  E  U  \  Sn_i  starts. 
Hence,  evaluation  of  A(S„_i  U  {w.})_1  from  (12)  and  (13) 
has  complexity  proportional  to  0(n'2). 

Given  a  set  Sn,  we  have  1 1] 

a(Sn)  =  |hSiP(5„  \  {s,:})-L|2,  (14) 


where  P(Sn)“L  denotes  the  projector  onto  the  orthogonal 
complement  of  Sl(Sn)  =  spon{hSl  :  si  E  Sn}.  It  follows 
that  if  (8)  and  (11)  yield  pu  =  0,  then  Rzf(Sn_ i  U  {w.})  < 
Rzf(Sn- 1).  We  discard  such  u.  We  also  discard  u  if  (8) 
and  (11)  yield  pSi  =  0  for  some  s,;  E  Sn-\.  This  is  done  to 
keep  complexity  at  bay,  for  otherwise  combinatorial  search 
might  effectively  emerge.  Hence,  user  u  is  a  candidate  for 
Sn  if  pi  >  0,  Vi  E  Sn_ i  U  {w.}.  From  the  properties  of 
water-filling,  this  holds  if 


i  U  { u }) 


< 


E 


iesn-iu{u} 


Ci(Sn- 1  U  {«}; : 


(15) 

where  c, . (,S'„  j  U  {w})  =  min  c,;(S„_i  U  {«}). 

Then,  we  have 


/'  = 


1 


E 


i 


Ci(Sn-i  U  {«}) 


(16) 


If  (15)  is  not  satisfied,  we  skip  to  the  next  u.  The  overall 
complexity  of  the  algorithm  is  0(N3  M). 

We  note  that  the  break  in  Step  2  is  necessary  when  GZF 
is  used,  but  redundant  when  ZF-DP  is  used;  it  is  shown  in 
[1,5]  that  in  the  latter  case,  maximum  sum  rate  can  always 
be  achieved  with  N  active  users  if  P  >  0  [1],  On  the  other 
hand,  when  ZF  alone  is  used,  the  optimum  number  of  active 
users  is  nopt  <  .V  and  decreases  as  P  decreases,  so  that  for 
P  — >  0,  the  ZF  scheme  reduces  to  maximum  ratio  combin¬ 
ing  (MRC),  nopt  =  1  [1],  This  also  holds  for  the  proposed 
GZF  algorithm,  which  follows  from  the  water-filling  equa¬ 
tion  in  (8)  and  the  fact  that  [ci  (Si  )]-1  =  max,.  {. 
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3.  SIMULATION  RESULTS 


4.  CONCLUSIONS 


The  performance  of  the  proposed  algorithm  is  presented  in 
Fig.  1.  The  y-axis  shows  sum  capacity  and  sum  rate  in 
bits  per  channel  use.  The  x-axis  shows  total  power  in  dB. 
Noise  level  of  every  user  is  1.  Sum  capacity  and  sum  rates 
are  averaged  over  100  channels.  Channels  are  complex- 
valued,  drawn  from  an  i.i.d.  Rayleigh  distribution  with  unit- 
variance  for  each  channel  entry.  Note  that  GZF  exhibits  the 
same  slope  of  rate  increase  per  dB  of  SNR  as  the  sum  ca¬ 
pacity  curve  at  moderate  and  high  SNR.  Also  note  that  given 
N,  an  increase  in  M  narrows  the  gap  between  the  sum  rate, 
achieved  using  GZF,  and  the  sum  capacity.  This  is  due  to 
multiuser  diversity  -  the  more  users  that  contend  for  trans¬ 
mission,  the  higher  the  probability  that  N  of  them  will  be 
almost  orthogonal.  This  in  turn  reduces  the  advantage  of 
DP-coding  based  schemes  over  ZF. 


Fig.  1.  GZF  Performance 


We  have  proposed  a  low-complexity  algorithm  for  down¬ 
link  transmission  in  the  GBC  for  the  realistic  case  wherein 
the  number  of  users  is  greater  than  the  number  of  transmit 
antennas.  We  have  evaluated  the  throughput  performance 
of  the  new  algorithm  via  simulations.  The  results  show  that 
ZF  beamforming  with  the  proposed  user  selection  method 
achieves  a  significant  fraction  of  sum  capacity,  at  a  low 
complexity  cost.  The  simulation  results  indicate  that  GZF 
achieves  the  same  slope  of  throughput  per  dB  of  SNR  as  the 
capacity-achieving  strategy  based  on  the  use  of  DP  coding 
for  known  interference  cancellation  and  convex  optimiza¬ 
tion.  Due  to  its  simplicity,  low  complexity,  and  close  to  op¬ 
timal  performance,  the  proposed  method  offers  an  attractive 
alternative  to  earlier  DP-based  methods. 
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